A proposal to add to CitEc
Background
This documents relates to CitEc.
CitEc's current system for automatic identification of references and
citations has at least two limitations:
- We can only process those documents that are availble in electronic
format and that have no access restrictions. That limits our
stock of processed documents to less
than half of the available documents in RePEc.
-
Once a documnent can be processed,
There are great problem with the extraction of text from it.
The software we are using for this purpose
(from Vividata Inc) does a good job overall. But it can't deal with all types
of PDF files. For this reason, at this time there are about 100k papers not
processed. The quality of the text extraction is very important for
identification of citations since it is based mainly in string pattern
matching.
These limitations are the cause of frecuent complains from authors. A
typical complaint is "Why is this citation to one of my papers have not
been included in IDEAS/EconPapers?" The answer can be: because the
citing paper has not been processed due to the problems described
earlier or because the parsing of the references failed.
At the technical level we are limited by external applications and
little can be done to improve the process. For such reason, a way
forward would be to develop procedures to allow input from registered
authors.
Proposal
In this document we present a proposal to extend the RePEc Author
Service in order to allow input of references and citations from
documents which have not been automatically processed. Such
contribution from authors has two issues to be resolved: input and
validation of data in order to prevent fraud.
Typically, we have to deal with two people. The "requester" is
the author of a cited paper. Requesters want us to know that
their papers have been cited. However, they can't credibly
demonstrate to use. Only the second person, the "author" of
the citing paper, can do
it. In the following, we assume that requester and author are
two different people.
The input process is describer later. For the validation of data we
will rely on the collaboration of registered authors in RAS. We will
develop a communication system between requesters, authors, and CitEc
management, using email or web forms. The aim is that each input
action from an requested should be validated for the corresponding
author before the citation is included in CitEc. That is, the request
to add a citation to a paper should be validated by one of the authors
of the citing paper.
The input data could be managed as folllows: