|
| INCLUDEFILE(jagt_mac.yo)
|
| notableofcontents()
|
|
| article(Cataloging Economics preprints:
| an introduction to the RePEc project
| +whenlatex(+latexcommand(\footnote{)The work
| discussed here has received financial support by the Joint
| Information Systems Committee of the UK Higher Education Funding
| Councils through its
| url(Electronic Library Programme)(http://www.ukoln.ac.uk/services/elib).
| We are grateful to
| url(Christopher F. Baum)(http://fmwww.bc.edu/EC-V/Baum.fac.html),
| url(Robert P. Parks)(http://wueconb.wustl.edu/~bob/),
| url(Thorsten Wichmann)(http://www.berlecon.de) and
| url(Christian Zimmermann)(http://www.er.uqam.ca/nobel/r14160/index.html)
| for comments on the questionnaires.
| url(William L. Goffe)(http://wuecon.wustl.edu/~goffe/)
| and Christian Zimmermann
| made many helpful suggestions.
| The topic of link(Subsection ref(sec:upta))(sec:upta)
| was suggested by url(Jane Greenberg)(http://ils.unc.edu/~janeg).
| Sophie C. Rigny kindly pointed
| out many stylistic and grammatical errors in an earlier version.
| +latexcommand(})))
| (Jos\'e Manuel Barrueco Cruz and Thomas Krichel)
| ()
|
| latexcommand(\thispagestyle{empty})
|
|
| latexcommand(\vfill\begin{center}) table(2)(ll)(row(cell(url(Jos\'e Manuel
| Barrueco Cruz)(http://www.uv.es/~barrueco )) cell(url(Thomas
| Krichel)(http://gretel.econ.surrey.ac.uk))) row(cell(url(Biblioteca de
| Ci\`encies Socials dq(Gregori Maians))(http://www.uv.es/econweb/))
| cell(url(Department of Economics)(http://www.econ.surrey.ac.uk)))
| row(cell(url(Universitat de Val\`encia)(http://www.uv.es))
| cell(url(University of Surrey)(http://www.surrey.ac.uk))) row(cell(Campus
| dels Tarongers s/n) cell(Stag Hill)) row(cell( 46071 Val\`encia ) cell(
| Guildford GU2 5XH )) row(cell( Spain ) cell( United Kingdom))
| row(cell(url(jose.barrueco@uv.es)(mailto:jose.barrueco@uv.es))
| cell(url(T.Krichel@surrey.ac.uk)(mailto:T.Krichel@surrey.ac.uk)))
| whenlatex(row(cell(tturl(http://www.uv.es/
| +latexcommand(~)barrueco))
| cell(tturl(http://gretel.econ.surrey.ac.uk))))
| row(cell( )
| cell( RePEc:per:1965+endash()06+endash()05:thomas_krichel)))
| +latexcommand(\end{center})
|
| abs(Cataloging scientific papers creates a new educational resource.
| Collecting that data is a costly process to achieve and manage. In
| particular the level of granularity that is required is finer than say for
| a collection of web sites. One possible approach towards cataloging these
| resources is to get a commonity of providers involved in cataloging the
| materials that they provide. This paper introduces RePEc of
| http://netec.wust.edu/RePEc, as an example for such an approach. RePEc is
| mainly a catalog of research papers in Economics. It is based on set of
| over 80 archives which all work independently but yet are interoperable.
| They together provide data about almost 60,000 preprints and over 10,000
| published articles. )
|
| latexcommand(\vfill)
|
| Jos\'e Manuel Barrueco Cruz is a librarian at the Universitat de
| Val\`encia. Thomas Krichel is a lecturer in Economics at the University of
| Surrey. Both welcome comments on this paper, write to
| wopec@netec.mcc.ac.uk. whenlatex(This paper is available online at
| tturl(http://openlib.org/home/krichel/papers/shankari.html).)
| whenhtml(This paper is url(available in
| PDF)(http://openlib.org/home/krichel/papers/shankari.pdf).)
|
| latexcommand(\vfill)
|
| sect(Introduction)
|
| Some scientific disciplines have a preprint tradition. Essentially these
| are Mathematics, Physics, Computer Science and Economics. Preprints are not
| issued in the same ways across those disciplines. In Mathematics and
| Physics preprints are essentially issued by individual academics. In
| Computer Science and Economics, it is more the department that distributes
| the preprints.
|
| In this paper we deal with Economics preprints, usually called working
| papers. Economics is the dismal science. Its bad reputation is founded on
| two conceptions. The first is that economists never agree on anything.
| Winston Churchill claimed dq(If you put two economists in a room, you get
| two opinions, unless one of them is Lord Keynes, in which case you get
| three opinions). And on the other side of the pond, President Truman
| sought to hire a one-armed economist because he would no be able to say
| dq(on the other hand). The other conception is that Economics is very
| theoretical to the point of being totally useless. A popular tale is that
| of the two economists who sit down to play chess. They study the board for
| 24 hours and eventually declare a stale-mate.
|
| Fortunately both of these conceptions do not fully apply to all sections of
| Economics. There is a large mainstream literature that is based on a common
| set of principles. It is true that this literature is heavily mathematical
| but that that does not follow that it is completely useless. There are
| counterexamples. For example the calculation of option values is important
| for anybody how is dealing with financial options. Trade in such options
| has only taken off since a pricing formula has been found. COMMENT(
| Derivatives research that is carried out by financial engineers within
| banks, investment houses etc therefore requires the occasional use of
| economic research work.) Another example are studies relating to
| competition. These are used by government organizations who work on
| regulating industries and on anti-trust measures.
|
| Economics research documents are therefore useful to a wide variety of
| people, not only to students. In the past years more and more Economics
| departments and research institutions have made their working papers
| available on the Internet. However in that form the papers can only be
| found by specialists who know who has been working in a certain area, where
| that researcher is based and whether there are any papers of that
| researcher available on the web pages. This is the kind of knowledge that
| is circulated at scientific conference+emdash()usually on the back of
| business cards and an napkins+emdash()and therefore this data is not
| available to the people outside the research commonity. The normal mortals
| will only be able to benefit if a catalog of these papers is available.
|
| In this paper, we describe attempts to build a catalog of online and
| offline working papers in Economics called RePEc. In link(Section
| ref(sec:repec))(sec:repec) we introduce the concepts behind it. The RePEc
| is spread over many archives and these are described in link(Section
| ref(sec:indi))(sec:indi). link(Section ref(sec:data))(sec:data) describes
| the contents of the dataset. In link(Section ref(sec:data))(sec:data) we
| review the RePEc dataset. link(Section ref(sec:data))(sec:user) we
| consider user interfaces to RePEc. link(Section ref(sec:conc))(sec:conc)
| concludes.
|
|
| sect(RePEc)label(sec:repec)
|
| The Electronic dissemination of Economics working papers can be traced back
| to the start of the Working Papers in Economics
| (url(WoPEc)(http://netec.mcc.ac.uk/WoPEc.html)) project in April 1993. By
| May 1999 this single archive has grown into an interconnected network of
| over 80 archives holding over 14,000 downloadable working papers and over
| 50,000 descriptions of offline papers from close to 1,000 series. The
| network of archives is called url(RePEc)(http://netec.wust.edu/RePEc).
| This term is was initially conceived to stand for dq(Research Papers in
| Economics). Nowadays it is best understood as a literal, because the
| objectives of RePEc go way beyond a database of scientific papers.
|
| RePEc data is freely available, in the sense that the provider pays for the
| provision of the data, not the user. In order to make such a system viable
| without public subsidy, the cost of providing the data must be spread among
| many agents+footnote(understood here and in the rest of the paper as a
| person or institution). This requirement has been a feature of RePEc right
| from the start of the collection in May 1997. Each participating provider
| sets up an archive on a http or ftp server. The archive supports the
| storage of structural data about objects relevant to Economics, and
| possibly the storage of some of the objects themselves. All objects in
| RePEc are uniquely identified following by handles.
|
| RePEc data can be accessed through a plethora of user services. Some
| are heavily used, for example the dq(url(IDEAS)(http://ideas.uqam.ca))
| user service had one million hits in just over 2 moths in 1999.
| The main interest of this paper is to examine the collection aspect of the
| data. The idea that a coherent literature catalog can be put together by a
| large group of people who are physically dispersed and have very little
| personal commonication without the need of extensive training nor intensive
| coordination remains to be demonstrated. At the time of writing this paper
| RePEc is two years old. We feel that this is a good time to review the
| operations of RePEc and the data that it has collected. Clearly the RePEc
| data is in a constant state of flux. To keep matters simple we took a dump
| of the data on 1 May 1999. In this paper we are only referring to the state
| of the data on that date.
|
| There are some aspects of RePEc that this paper does not discuss. We
| eschew any mentioning of the data on software, books, etc to concentrate on
| the collection of traditional academic papers be they preprints or
| published articles. This data forms the bulk of the present collection. We
| also leave out the personal and institutional data which are is not included in
| the papers and article templates. We aim to use such data to build a fully
| relational database system that describes Economics as a discipline. We
| will report on such efforts in future papers.
|
| The nature of RePEc is not precisely defined. Most people
| think about it as a collection
| of archives and services that provide data about Economics.
| More precisely, RePEc is most commonly understood as referring to three
| things. First it is a collection of archives that provide data about
| Economics. Second it is the data that is found on these archives. Third, it
| is often also understood to represent the set of agents who build archives
| and channel the data from the archives to the users. In that latter sense
| RePEc has no formal management structure.
|
| RePEc has two aims. The dq(cataloging aim) is to provide a complete
| description of the Economics discipline that is available on the Internet.
| The dq(publishing aim) is to provide em(free)+latexcommand() access
| to Economics resources on the Internet.
|
| COMMENT(--- Este ejemplo creo que no es claro. Yo lo quitaria pues no aporta
| nada nuevo, solamente matiza lo anterior a un nivel que puede confundir
| al lector
|
| These aims are sometimes conflicting.
| For example, let us assume that a certain amount of money is available for
| cataloging purposes. Then the library objective might be best served by
| using these funds to gather information about a high-quality toll-gated
| journal resource, whereas the publishing objective would be better served
| by considering a collection that is on the Internet and may not be of the
| same quality since it has not yet been extensively peer-reviewed. RePEc
| has ambition to become involved in
| peer-review; however it can be used to support peer
| review. An initial move into that direction is the NEP project that we will
| mention again in link(Section ref(sec:user))(sec:user).
|
| --- )
|
| The basic principle of RePEc can be summarized as follows center( Many archives
| +latexcommand( $\Longrightarrow$ ) +htmlcommand( ---> )One
| dataset+latexcommand( $\Longrightarrow$ ) +htmlcommand( ---> ) Many services
| )
|
| Basic RePEc concepts are: archive, site and service.
|
| itemize(
| it()
| An dq(archive) is a space on a public access computer system which makes data
| available. It is a place where original data enters the system. The is no need
| to run any software other than an ftp or http daemon that makes the files in
| the archive available upon request. Each archive is identified by a
| three-letter code. Some elementary metadata about the archive like its name,
| its url and some basic contents information are polled by a special central
| archive with the handle
| RePEc:all, where dq(RePEc) is the naming authority and dq(all) is the archive
| code.
| it()A dq(site) is a collection of archives
| on the same computer system. It usually
| consists of a local archive augmented by frequently updated (dq(mirrored))
| copies of remote archives.
| it()A dq(service) is a rendering of RePEc data in a form that is available to
| the end user.
| )
|
|
| All archives hold papers and metadata about papers, as well as software
| that is useful to maintain archives. Everything contained in an archive may
| be mirrored. For example, if the full text of a paper is in the archive, it
| may be mirrored. If the archive does not wish the full text to be mirrored,
| it can store the papers outside the archive. The advantage of this
| dq(remote storage) is that the archive maintainer will get a complete set
| of access logs to the file. The disadvantage is that every request for the
| file will have to be served from the local archive rather than from the
| RePEc site that the user is accessing. Of course an archive may also
| contain data about documents that are exclusively available in print.
|
| There is no need for every site to mirror the complete contents of every
| archive in the system. To conserve disk space and bandwidth some sites only
| mirror bibliographic information rather than the documents that an archive
| may contain. Others mirror all the files of an archive. Others may mirror
| only parts of a few archives. The software that is used to mirror the
| archive is provided at RePEc:all. It first mirrors the central archive.
| This software then reads a configuration file and then writes batch calls
| to the popular dq(url(mirror)(http://sunsite.ic.ac.uk/mirror)) program for
| ftp and the dq(url(w3mir)(http://www.math.uio.no/~janl/w3mir/)) script for
| http archives.
|
| An obvious way to organize the mirroring process would be to mirror the
| data of all archives to a central location. This central location would in
| turn be mirrored to the other RePEc sites. The founders of RePEc did not
| adopt that solution, because it would be quite vulnerable to mistakes at
| the central site. Instead each site installs the mirroring software and
| mirrors dq(on its own), so to speak. Not all of them adopt the same
| frequency of updating. Many update every night, but a minority only updates
| every week. It is therefore not known how long it takes for a new item to
| be propagated through the system.
|
| Each service has its own name. A service that is based on mirrored scripts
| may run on many locations. Within reason, all services are free to use any
| part of the RePEc data as they see fit. For example a service may only
| show papers that are available electronically, others may restrict the
| choice further to act as quality filters. In this way services implement
| constraints on the data, whether they be availability constraints or
| quality constraints. The user service infrastructure is quite well
| developed, we list the most important ones in link(Section
| ref(sec:user))(sec:user). This distribution via the several user services
| is undisputedly successful feature of RePEc. It is therefore not given
| further attention here.
|
|
| sect(The structure of an archive)label(sec:indi)
|
| RePEc stands on two pillars. First, an em(attribute):em(value) template
| metadata format called ReDIF. This acronym stands for em(Re)search
| em(D)ocumentation em(I)nformation em(F)ormat but it is best understood as a
| literal. ReDIF defines a number of templates. Each templates describes an
| object in RePEc. It has a set of allowable fields, mandatory, and
| some repeatable. The second pillar is the Guildford protocol. It fixes
| rules how to store ReDIF in an archive. It basically indicates which files may
| contain which templates. It is possible to deploy ReDIF without using the
| Guildford protocol. But in the following we will ignore this conceptual
| distinction, because it is easiest to understand the structure and contents
| of an archive through an example. This is done in link(Subsection
| ref(sec:guil))(sec:guil). Therefore we will list files in the way required
| by the protocol as well as the contents of the file that is in fact written
| in ReDIF. This is done in link(Subsection ref(sec:guil))(sec:guil). We
| return to technical aspects of ReDIF in link(Subsection
| ref(sec:redi))(sec:redi).
|
| subsect(The Guildford Protocol)label(sec:guil)
|
| RePEc identifies each archive by a simple identifier or handle. Here we look at
| the archive RePEc:sur which lives at
| tturl(ftp://www.econ.surrey.ac.uk/pub/RePEc/sur). On the root directory of the
| archive, there are two mandatory files. The file em(surarch.rdf) contains a single
| ReDIF archive template.
| verb(Template-type: ReDIF-Archive 1.0
| Name: University of Surrey Economics Department
| Maintainer-Email: T.Krichel@surrey.ac.uk
| Description: This archive provides research papers from the
| Department of Economics of the University of Surrey,
| in the U.K.
| URL: ftp://www.econ.surrey.ac.uk/pub/RePEc/sur
| Homepage: http://www.econ.surrey.ac.uk
| Handle: RePEc:sur)
| In this file we find basic information about the archive. The other
| mandatory file is em(surseri.rdf). It must contain one
| or more series templates.
| verb(Template-Type: ReDIF-Series 1.0
| Name: Surrey Economics Online Papers
| Publisher-Name: University of Surrey, Department of Economics
| Publisher-Homepage: http://www.econ.surrey.ac.uk
| Maintainer-Name: Thomas Krichel
| Maintainer-Email: T.Krichel@surrey.ac.uk
| Handle: RePEc:sur:surrec)
| These two files are the only mandatory files in the Guildford
| protocol. If these are the only files present in the archive then all the archive
| is doing is to reserve the archive and the series codes. All documents have
| to be in a series. The papers for the series RePEc:sur:surrec are confined
| to a directory called em(surrec). It may contain files of any type. Any file
| ending in dq(.rdf) is considered to contain ReDIF templates. Let us
| consider one of them, em(surrec)/em(surrec9601.rdf)+footnote(We suppress
| the Abstract: field to conserve space.)+verb(Template-Type: ReDIF-Paper 1.0
| Title: Dynamic Aspect of Growth and Fiscal Policy
| Author-Name: Thomas Krichel
| Author-Email: T.Krichel@surrey.ac.uk
| Author-Name: Paul Levine
| Author-Email: P.Levine@surrey.ac.uk
| Author-WorkPlace-Name: University of Surrey
| Classification-JEL: C61; E21; E23; E62; O41
| File-URL: ftp://www.econ.surrey.ac.uk/pub/
| RePEc/sur/surrec/surrec9601.pdf
| File-Format: application/pdf
| Creation-Date: 199603
| Revision-Date: 199711
| Handle: RePEc:sur:surrec:9601)
| COMMENT(
| Note that in this example the full text of the paper is located within the
| archive directory structure itself. Therefore the full text is mirrored
| together with the bibliographic data. Services can then link to a mirrored
| copy of the full text. If the URL of the paper would point to a place outside
| the archive structure, the link to the full text would always point to that
| location.
| )
| Note that we have two authors here. The
| dq(Author-WorkPlace-Name) attribute only applies to the second author.
| We will come discuss this point now.
|
| subsect(The ReDIF metadata)label(sec:redi)
|
| COMMENT(jmbc(---Tipica pregunta que nos pueden hacer: por que ReDIF y no otro formato
| mas aceptado como.... el DC? ---))
|
| The ReDIF metadata is mainly an extension of the
| latexcommand(\citeN{petdeu94publishing})
| whenhtml(dq(Publishing Informa-tion on the Internet with Anonymous
| FTP), an Internet draft that expired March 1, 1995), commonly known
| as the IAFA templates. In particular it borrows the idea of clusters from
| the draft quote( There are certain classes of data elements, such as
| contact information, which occur every time an individual, group or
| organization needs to be described. Such data as names, telephone numbers,
| postal and email addresses etc. fall into this category. To avoid repeating
| these common elements explicitly in every template below, we define
| dq(clusters) which can then be referred to in a shorthand manner in the
| actual template definitions. ) ReDIF takes a slightly different approach
| to clusters. A cluster is a group of fields that jointly describe a
| repeatable attribute of the resource. This is best understood by an
| example. A paper may have several authors. For each author we may have
| several fields that we are interested in, the name, email address, homepage
| etc. If we have several authors then we have several such groups of
| attributes. In addition each author may be affiliated with several
| institutions. Here each institution may be described by several attributes
| for its name, homepage etc. Thus a nested data structure is required. It
| is evident that this requirement is best served in a syntax that
| explicitly allows for
| it such as XML. However in 1997+emdash()when ReDIF was
| designed+emdash()XML was not available. We are still convinced that the
| template syntax is more human readable and easier understood. However the
| computer can not find which attributes correspond to the same cluster
| unless some ordering is introduced. We proceed as follows.
| For each group of arguments that make
| up a cluster we specify one attribute as the dq(key) attribute. Whenever
| the key attribute appears a new cluster is supposed to begin. For example
| if the cluster describes a person then the name is the key. If an
| dq(author-email) appears without an dq(author-name) preceding it
| the parsing software aborts the processing of the template .
|
| Note that the designation of key attributes is not a feature of ReDIF. It
| is a feature of the template syntax of ReDIF. It is only the syntax that
| makes nesting more involved. We do not think that this is an important
| shortcoming. In fact we believe that the nested structure involving the
| persons and organizations should not be included in the
| document templates. What should be done instead is to
| separate the personal information out of the document templates into
| separate person templates
| verb(Template-Type: ReDIF-Person 1.0
| Name: Thomas Krichel
| Email: T.Krichel@surrey.ac.uk
| Author-Paper: RePEc:sur:surrec:9404
| Author-Paper: RePEc:sur:surrec:9601
| Homepage: http://gretel.econ.surrey.ac.uk
| Handle: RePEc:per:1965-06-05:thomas_krichel)
| We can then replace the author information for the first author in the
| paper template for +bendtt(RePEc:sur:surrec:9601) by verb(Author-Name: Thomas Krichel
| Author-Person: RePEc:per:1965-06-05:thomas_krichel) The
| benefits of such a relational structure are clear. There is a much reduced
| load on administration of the system. When one element of author
| data+emdash()+eg()her phone number+emdash()changes, this change has to be
| registered at only one point in the system. A pervasive use of these
| relational features will allow the resolution of current author information
| through the current person template of the author. The user of a RePEc
| service would therefore find the author of the paper even though the
| contact information on the paper's title page may no longer be current. We
| leave the implementation of such systems for future work.
|
|
|
| sect(The total dataset)label(sec:data)
|
|
| +latexcommand(\begin{table}\begin{center}) table(5)(lrlrl)(
| row(cell()cells(2)(ReDIF-paper) cells(2)(ReDIF-article) )
| row(cell(em(field)) cell(em(all))cell(em(max)) cell(em(all)) cell(em(max)) )
| row(cell(template-type)cell(58254)celll(1) cell(10112) celll(1) )
| row(cell(handle) cell(58251) celll(2)cell(10110) celll(1) )
| row(cell(title) cell(58235) celll(2) cell(10110)celll(1) )
| row(cell(author-name) cell(98321) celll(14) cell(13855) celll(6) )
| row(cell(creation-date) cell(52730) celll(1) cell(8819)celll(1) )
| row(cell(revision-date) cell(536) celll(8) cell() celll() )
| row(cell(publication-date) cell()celll() cell(510) celll(1) )
| row(cell(abstract) cell(22984) celll(3) cell(1896) celll(1) )
| row(cell(classification-jel) cell(20194) celll(2) cell(436) celll(1) )
| row(cell(keywords) cell(39219)celll(3) cell(9084) celll(1) )
| row(cell(keywords-attent) cell(457) celll(1) cell() celll() )
| row(cell(publication-status) cell(6227) celll(3) cell(1568) celll(1) )
| row(cell(note)cell(9011) celll(1) cell(1479) celll(2) )
| row(cell(series) cell(4124) celll(2) cell()celll() )
| row(cell(number) cell(16021) celll(2) cell(1501) celll(1) )
| row(cell(price) cell(4175) celll(3) cell() celll() )
| row(cell(file-url) cell(17259) celll(22) cell(1853) celll(2) )
| row(cell(order-url) cell(2417) celll(1) cell() celll() )
| row(cell(contact-email) cell(1141) celll(1) cell() celll() )
| row(cell(availability) cell(7169) celll(2) cell() celll() )
| row(cell(length)cell(33342) celll(12) cell() celll() )
| row(cell(pages) cell() celll() cell(7920) celll(1) )
| row(cell(month) cell() celll() cell(489)celll(1) )
| row(cell(issue) cell() celll() cell(8705) celll(1) )
| row(cell(volume) cell() celll() cell(1293) celll(1))
| row(cell(year) cell() celll()cell(1293) celll(1) )
| row(cell(journal) cell() celll() cell(488) celll(1) )
| row(cell(paper-handle)cell() celll() cell(19) celll(1) )
| )
| +latexcommand(\caption{The data in article and paper templates}
| \label{tab:b}\end{center}\end{table})
|
| subsect(Aggregate Contents)
|
|
|
| In Table 1, we examine the document data in RePEc. For each field we give
| the total occurrences of the field in the dq(all) column and the maximum of
| occurrences that the field has within a single template in the dq(max)
| column. The document data appear in the ReDIF-paper and the ReDIF-article
| templates. There are two characteristics that potentially set articles
| apart from papers. First the paper can be understood as a preprint. From
| that point of view the article is a paper that has gone through some sort
| of peer review. In that case the distinction between paper and article has
| to do with the contents only. Secondly the distinction between paper and
| articles could be through their physical manifestation. From that point of
| view the article would be a document that is bound with others in a journal
| issue and it would therefore carry page numbers, issue numbers etc. This is
| the official criterion according to the ReDIF documentation. But it is not
| neat since the pagination may become redundant if the journal becomes
| electronic. In the following we will use the term dq(document) when we wish
| to refer to papers and articles simultaneously.
|
| Total numbers for documents are given by the dq(template-type) and
| dq(handle) fields. Since each template should have exactly one type and
| exactly one handle the tiny difference between the two numbers is made up
| of mistakes in the dataset. The title field is also required. It is
| encouraging to see that most documents have a creation date attached to
| them,
| because as the dataset grows it will become increasingly important to
| distinguish between recent and dated documents; only the former are likely
| to be of much interest. By contrast dq(revision-date) information is rare.
| Articles may also have a dq(publication-date). The difference of this field
| with the dq(creation-date) field is not clear. We consider this to be a
| design error in the template structure.
|
| COMMENT(--- Yo lo quitaria. publication-date no esta en la documentacion
| de ReDIF. Puede ser un error en redif.spec??)
|
| Let us consider the elements that refine the contents description. We
| encourage contributors to provide abstracts. The presence of abstracts for
| about one in three papers is very positive. The abstract field can be
| repeated. This is desirable when there are abstracts in different
| languages. A large number of the papers have a url(Journal of Economic
| Literature)(http://www.jel.org) (JEL) classification code attached to
| them. However almost all papers in the offline papers only archive
| RePEc:fth have the codes and that explains a very large
| proportion of the
| classified material. Note that this data has been compiled by a librarian.
| For the electronic papers there are only two in five papers that have a
| classification field. We agree that this is a serious limitation to the
| quality of the data. It would have been possible to require a
| classification number for each paper right from the start. This would have
| hampered the collection effort. In particular it would have made it
| impossible for the WoPEc team to dq(snarf) bibliographic data from sites
| where this JEL data was not available. There is also some concern among
| economists that their areas of work do not match with these codes. The use
| of more complete and sophisticated classification schemes would not be
| possible. The main argument against requiring JEL classification codes was,
| however, that there is considerable opposition against the scheme in the
| heterodox Econonomics commonity. They feel that the JEL classification
| scheme reflects the view of the orthodoxy. Requiring JEL classification
| codes would have meant excluding these contributors. Then and now only a tiny part of
| the collection could be grouped as heterodox. However our aim is that RePEc
| be a broad church. This was the decisive argument against requiring the use
| of JEL codes.
|
| There is a large number of templates that have keywords. About 50% of these
| templates come from RePEc:fip where each paper has a keyword. ReDIF allows
| for both free and qualified controlled vocabulary. This facility is
| used by for the internal keyword scheme of the url(Attent: Research
| Memoranda)(http://cwis.kub.nl/~dbi/english/info/attent.htm) database. They
| are only used by the RePEc:dgr archive.
|
| COMMENT(--- Yo lo quitaria. publication-date no esta en la documentacion
| de ReDIF. Puede ser un error en redif.spec??)
|
|
| The dq(publication-status) field can be used to indicate where the paper
| has been submitted to and where the paper has been formally published.
| This field appears in the data from large research bodies that have been
| issuing a series of papers for many years and that have data about the
| formal publication of the paper. The fields dq(series) and dq(number) are
| somewhat redundant since this information should also be available from the
| handle. The dq(price) field normally refers to the delivery of a printed
| copy. The mode of delivery is often just expressed in the dq(price) field.
| The dq(file-url) field refers to the dq(full text) locus of a part of the
| full text. Usually it is the complete full text.
|
| The document may have several
| components in addition to the full text. These can be listed as several
| dq(file) clusters. Each may carry an uncontrolled field about its function
| within the paper. For example the author may wish to supply a computer
| program that was used to produce the paper. In that case a whole series of
| files may be made available. However that is not the way the option of
| having many files is actually exercised. Most of the time it is used to
| include elements like graphics or tables that the author did not manage to
| include into the main document file.
|
| The dq(order-url) field is used to point to an intermediate page that sits
| between our description and the files of the document. In that case we are
| not aware if the resource does actually exist online. dq(order-url) may be
| used in conjunction with the dq(file-url) attribute. Note that there is no
| dq(order-email) field in the document templates. Such a field figures in
| the series template, because the ordering of a paper should be the same for
| all papers in the series. The dq(contact-email) may otherwise be used to
| contact the somebody who has any connection with the paper. This field is
| only used by the contributors to the RePEc:wpa archive. The
| dq(availability) is used most of the time to signal that the paper is no
| longer in print.
|
| Finally a
| dq(length) attribute can be used to indicate how many pages the reader has
| to go through to read the paper. This field is present in all templates
| provided by RePEc:fth and it seems to appear in a surprisingly large number
| of other templates.
|
| Articles have a number of specific attributes that are listed at the bottom
| of the table. Strictly speaking these are not descriptive elements of the
| articles themselves, they rather relate to the position the article has
| within the journal. Finally the dq(paper-handle) allows to point from the
| preprint version to the article template.
|
| +latexcommand(\begin{table}\begin{center}) table(9)(lrllrllrl)(
| row(cells(3)(file)cells(3)(person)cells(3)(organization))
| row(celll(em(name))cellr(em(all))celll(em(max))
| celll(em(name))cellr(em(all))celll(em(max))celll(em(name))
| cellr(em(all))celll(em(max))
| )
| row(celll(url)cellr(19112)celll(1)
| celll(name)cellr(112176)celll(1)
| celll(name)cellr(8598)celll(1)
| )
| row(celll(format)cellr(19024)celll(1)
| celll(postal)cellr(8)celll(1)
| celll(postal)cellr(2118)celll(2)
| )
| row(celll(size)cellr(2630)celll(1)
| celll(homepage)cellr(1557)celll(2)
| celll(homepage)cellr(596)celll(2)
| )
| row(celll(function)cellr(1661)celll(1)
| celll(email)cellr(3166)celll(2)
| celll(email)cellr(1451)celll(3)
| )
| row(celll(restriction)cellr(2548)celll(1)
| celll(phone)cellr(282)celll(1)
| celll(phone)cellr(164)celll(1)
| )
| row(celll()cellr()celll()
| celll(fax)cellr(259)celll(1)
| celll(fax)cellr(197)celll(1)
| )
| row(celll()cellr()celll()
| celll(workplace-name)cellr(8598)celll(4)
| celll()cellr()celll()
| ))+latexcommand(\caption{The data in clusters}
| \label{tab:g}\end{center}\end{table})
|
| subsect(The clustered data)
|
| The data available in Table 1 is not the complete set of information
| available in the dataset. It only lists the individual attributes and the
| key attributes of clusters in the paper and article templates. In Table 2 we
| have the data that is contained in the clusters in this subset of the RePEc
| data. This data is therefore consistent with the data in Table 1.
|
| There are three types of clusters, dq(file), dq(organization) and
| dq(person). The numbers that are present suggest that there are significant
| possibilities for a relational structure in the dataset between persons and
| their organizations. An interesting consideration in the person cluster is
| the high number of workplace templates. Providers of the data seem to
| attribute more importance to the workplace of a person rather than to her
| strictly personal data, eg()her homepage. The only explanation that we can
| offer here is that most likely the data is provided by an agent of the
| workplace. The low number of homepages is an indicator which also suggests
| that in most cases the provider is not the author herself. Note also that
| the workplace information+emdash()when it is present+emdash()is much more
| complete than the corresponding data for the individuals.
|
| sect(User services)+label(sec:user)
|
| There would be little point in collecting all that data if there were no
| users to use them. Note that there is no official user service for RePEc.
| The implicit ability and explicit intention to allow for many user services
| at one time is a key features of RePEc. This provides an important selling
| point once a potential provider understands that submitting data to RePEc
| means submitting the data to all the user services at once. Here we list
| the most important user services in
| link(Subsection ref(sec:main))(sec:main), before we critically
| discuss them in link(Subsection
| ref(sec:upta))(sec:upta).
|
|
| subsect(The main user services)label(sec:main)
|
| By order of historical appearance, they are
|
| furl(BibEc)(http://netec.mcc.ac.uk/BibEc.html)+amp() +latexcommand(\par
| )+furl(WoPEc)(http://netec.mcc.ac.uk/WoPEc.html)+nl() +noindent() provide
| static html pages for all working papers that are only available in print
| (BibEc) and all papers that are available electronically (WoPEc). Both
| datasets use the same search engines. There are three search engines, a
| full text WAIS engine, a fielded search engine based on the mySQL
| relational database and a ROADS fielded search engine. Note that the mySQL
| database is also used for the control of the relational components in the
| RePEc dataset. BibEc and WoPEc are mirrored in the United States and Japan
| as part of the NetEc project.
|
| furl(IDEAS)(http://ideas.uqam.ca)+nl()
| +noindent() provides an Excite index of static html pages that represent all
| Paper, Article and Software templates. This is by far the most popular RePEc
| user interface.
|
| furl(NEP: New Economics Papers)(http://netec.wustl.edu/NEP)+nl()
| +noindent() is set of reports on new additions of papers to RePEc.
| Each report is edited by subject specialists who receive information on all
| new additions and then filter out the papers that are relevant to the
| subject of the report. These subject specialists are PhD students and young
| researchers. They work as volunteers. On 27 June 1999 there were 1766
| different email addresses that subscribed to at least one list.
|
| furl(Tilburg University Working papers and research memoranda)(http://www.kub.nl/~dbi/demomate/repref.htm)+nl()
| This site also operates a Z39.50 server for all downloadable papers in
| RePEc is available at dbiref.kub.nl:9997. The database name is
| dq(repref). The attribute set is Bib-1, and the record syntax supported are
| USmarc, SUTRS, GRS-1 (only string tags, tag type 3).
|
| furl(RuPEc)(http://www.ieie.nsc.ru/RuPEc)+nl()
| is a server in Russian. It does not only provide search facilities
| for Russian users but also archival facilities for Russian contributors.
|
| furl(INOMICS)(http://www.inomics.com/query/search)+nl()
| not only
| provides an index of RePEc data but also allows simultaneous searches
| in indexes of other web pages related to Economics.
|
| The dq(Tilburg University Working papers and research memoranda) service is
| operated by a library-based group that has received funding from the
| European Union. INOMICS is operated by the Economics consultancy
| url(Berlecon)(http://www.berlecon.de). All the other user services are
| operated by junior academics.
|
| subsect(The usage of user services)label(sec:upta)
|
| Thomas Krichel founded both the WoPEc user service in 1993 and NEP in
| 1998. Jos\'e Manuel Barrueco has been the intensively involved in WoPEc
| user education. Our experience suggests that the average users from
| developed countries are at the postgraduate and doctoral level. There are
| many users in developing countries. In these countries the user commonity
| includes more senior levels, ie()more junior academics and professional
| researchers rather than students. For them the RePEc user services are one
| of the very few means to get hold of research papers. We think that this is
| the most rewarding aspect of our work. The free provision of RePEc helps to
| reduce the gap between the informationally rich and the informationally
| poor.
|
| The use of RePEc services among senior academics in the developed countries
| seems to be low. Is this because these people are too much set in their
| ways to use these modern facilities? We do not think so.
| Some people think that the low usage by tenured academics
| We believe that
| the current user services do not meet the information needs of these
| people. Academics do not need large-scale information services that they
| can search. The larger the scale the more likely they are to find
| information they did not seek and the less likely they are to find
| information that they want. Since they are working within a very narrow
| field and only have little time to read a small amount of literature
| small-scale information services are more tailored to their needs. In
| addition the contents of the service should be highly selective. Among the
| current user services that are built on the RePEc data, NEP comes closest
| to such services. Our anecdotal evidence suggests that this is the service
| that has the largest proportion of tenured academics.
|
| RePEc as such can not provide small-scale user services. It can only
| provide the basis for such user services to exist. We are aware of two
| approaches to build such services. Section 4 of
| latexcommand(\citeN{serpar99online}) whenhtml(url(Krichel, Lyapunov and
| Parinov (1999))(http://gretel.econ.surrey.ac.uk/papers/zhenya.pdf))
| describes design features for a current awareness portal system where each
| researcher could register the subject and type of records that she is
| interested in. The portal would then be able to inform the researcher about
| new resources in her field. A second approach is outlined is Section 6 of
| latexcommand(\citeN{kribau99edel}) whenhtml(url(Baum and Krichel
| (1999))(http://netec.mcc.ac.uk/AcMeS/edel.html)). Here the idea is to build
| peer review web (dq(SurWeb)) services. These are supposed to extend NEP to
| full peer review. It is too early to speculate if such a system can be put
| into place.
|
|
| sect(Conclusions)label(sec:conc)
|
| The free provision of educational material can be implemented through a
| central institution. Such an institution needs to be subsidized by central
| funds. The alternative is to provide the resources by a large number of
| agents. Then the cost of providing access can be absorbed within each
| institution. In that case the question of a comprehensive catalog
| arises. Such a catalog is needed to provide access to the collection in a
| unified way.
|
| In this paper we have dealt with the provision of a key resource
| ie()academic papers. We have presented a collection of metadata that is
| provided by decentralized archives. We have found that it is possible to
| build such a collection to a reasonable degree of accuracy if some archives
| where mistakes occur are aided by others. There needs to be a small group
| of people who actively support the collection. However this support can be
| given in decentralized fashion without the need for much coordination
| between supporters.
|
| The academic library commonity in the United Kingdom as a whole has made a
| important contribution to RePEc by donating funds to the work of the WoPEc
| project. This has allowed the WoPEc project to collect metadata about
| papers that are published by institutions that are not yet contributing to
| RePEc. This was a vital aspect of WoPEc project. The data collected by
| WoPEc constituted 90% of the RePEc data when RePEc was founded. However
| nowadays that proportion is falling. The funding for WoPEc has run out but
| the WoPEc web site continues to expand because of the contributions by made
| by RePEc archives. The software is maintained by volunteers.
|
| Librarians should carefully consider the vision of the project. This is a
| kind of academic self-organization where academics publish and catalog their
| own work. RePEc benefits from network externalities. The more academics join
| the more those who have not joined will feel pressure to join. If the data is
| freely available than authors can commonicate with their peers without the need
| of intermediaries. The providers of intermediation services have every reason
| to be worried. They include publishers em(and)+latexcommand(\/) librarians. If
| librarians do not play a more active part by supporting developments like RePEc
| there will be no more r\^ole for them in the future. Write to
| url(RePEc@netec.mcc.ac.uk)(mailto:repec@netec.mcc.ac.uk).
|
|
| latexcommand(\bibliography{bib})
|
| whenhtml(htmlcommand( )
| The work
| discussed here has received financial support by the Joint
| Information Systems Committee of the UK Higher Education Funding
| Councils through its
| url(Electronic Library Programme)(http://www.ukoln.ac.uk/services/elib).
| We are grateful to
| url(Christopher F. Baum)(http://fmwww.bc.edu/EC-V/Baum.fac.html),
| url(Robert P. Parks)(http://wueconb.wustl.edu/~bob/),
| url(Thorsten Wichmann)(http://www.berlecon.de) and
| url(Christian Zimmermann)(http://www.er.uqam.ca/nobel/r14160/index.html)
| for comments on the questionnaires. The topic of
| link(Subsection ref(sec:upta))(sec:upta)
| was suggested by url(Jane Greenberg)(http://ils.unc.edu/~janeg).
| url(William L. Goffe)(http://wuecon.wustl.edu/~goffe/) and
| Christian Zimmermann
| made many helpful suggestions.
| Sophie C. Rigny kindly pointed
| out many stylistic and grammatical errors in an earlier version.
| )
|
|
| COMMENT(
|
| htmlcommand(
Appendix:
| The composition of the dataset by archive
|
| )
| latexcommand(\appendix%\setcounter{table}{4})
|
| There is a central archive RePEc:all, that mirrors all the
| em(???arch.rdf) and em(???seri.rdf) files from all archives. It
| also contains the software that allows sites to mirror archives. RePEc:all
| also provides reading and checking software for templates as well as
| general RePEc documentation. This archive lives at
| tturl(ftp://netec.mcc.ac.uk/pub/RePEc/all).
| )
|
|
| COMMENT(
| LocalWords: notableofcontents whenlatex latexcommand otnote Thorsten Wichmann
| LocalWords: Zimmermann thispagestyle vfill ll Tarongers tturl barrueco endash
| LocalWords: abs interoperable dq ref indi soci conc Longrightarrow gt guil lr
| LocalWords: htmlcommand redi subsect citeN petdeu homepage XML emdash WUStL
| LocalWords: allowbreak Shuetrim Parinov cont loos celll cellr wpa wuk hhb fth
| LocalWords: fip bbk cpr nbr ecm val apr boc nos cre bru tex tcd subsubsect ie
| LocalWords: WoBa sunkar whenhtml Fethy's suppo Kurmanov farty bendtt rech jel
| LocalWords: rela URLs admin webmaster lrlrl attent dgr lrllrllrl attr templ
| LocalWords: redif eval fileformat regex RFC yyyy dd longtable lrlrlrlrlrl aah
| LocalWords: cellsl anu bca bej Berlecon ber boe bon bro caf ccd Centre cep Co
| LocalWords: CEPII cii cje Levine's cla CEPREMAP cpm cty dal UPV EHU ehu Eni
| LocalWords: Fondazione Mattei fem fmg Universitaet Wirtschaftswissenschaften
| LocalWords: fra gla hwe IGIER Innocenzo Gasparini igi Jena jen CoFE knz lec
| LocalWords: Leicester Disccussion Ingber lei Universite Laval Departement lvl
| LocalWords: d'economique NUIM mce mcl McMaster mcm Universidad Publica UPNA
| LocalWords: Navarra nav Nir Dagan Volij nid Netnomics nnm nsr SUNY Oswego nyo
| LocalWords: osu Cramton pcc qeh sbu CSEF sef Li ge Econonomics SEII sei sie
| LocalWords: Siegen sus tor UPO Scienze Politiche uca uia ukc umd Pompeu Fabra
| LocalWords: upf vic Globalisation Regionalisation CSGR wck SFB xrs yor Brunel
| LocalWords: Birkbeck Universtiy CREFE dur ESRC rba Facultad Ciencias noindent
| LocalWords: Economicas Tilburg RuPEc upta serpar Lyapunov kribau edel SurWeb
| LocalWords: coar Haworth mySQL consultancy informationally Programme hr appe
| LocalWords: tcounter tlength tabcolsep eal txt MSdos heterodox co Este creo
| LocalWords: ejemplo que es claro lo quitaria pues aporta nada nuevo solamente
| LocalWords: matiza un nivel puede confundir lector Por adir aqui grafico mas
| LocalWords: Quedaria mucho Si quieres tenemos tiempo puedo practicar poco Una
| LocalWords: vez hemos dise uno podemos aplicarlo toda documentacion jmbc por
| LocalWords: Tipica pregunta pueden hacer otro formato aceptado como DC eg si
| LocalWords: necesaria lista involveh Qu ebec esta frase da impresion archivos
| LocalWords: obra una institucion debe garantizar su permanencia pero verdad
| LocalWords: indicador seria cuando dejado actualizarse desde hace Quizas otra
| LocalWords: habria notar falta bibliotecarios enfatizar catalogacion ahora yo
| LocalWords: manos los autores distribuidores anecdota Aunque muy para unos UK
| LocalWords: significativa entenderla necesitan conocimientos seguramente aun
| LocalWords: tiene encuantra ninguna siga leyendo articulo ser Rigny Mili's de
| LocalWords: ole INCLUDEFILE shankari mac url RePEc Baum sec Goffe Jos Krichel Ci
| LocalWords: Biblioteca encies Socials Gregori Maians Universitat encia dels
| LocalWords: Guildford GU XH thomas krichel PDF WoPEc downloadable offline ftp
| LocalWords: repec http al NEP metadata mir undisputedly ReDIF ocumentation se
| LocalWords: nformation ormat sur surarch rdf surseri surrec WorkPlace pdf hhs
| LocalWords: Karlsson IAFA Valencia Sune Montr EconWPA xxx Fethy Mili perl nl
| LocalWords: Universit homepages seri BibEc html NetEc repref USmarc SUTRS GRS
| LocalWords: INOMICS WoBA WAIS Heriot
| ) |