INCLUDEFILE(jagt_mac.yo) notableofcontents() article(Cataloging Economics preprints: an introduction to the RePEc project +whenlatex(+latexcommand(\footnote{)The work discussed here has received financial support by the Joint Information Systems Committee of the UK Higher Education Funding Councils through its url(Electronic Library Programme)(http://www.ukoln.ac.uk/services/elib). We are grateful to url(Christopher F. Baum)(http://fmwww.bc.edu/EC-V/Baum.fac.html), url(Robert P. Parks)(http://wueconb.wustl.edu/~bob/), url(Thorsten Wichmann)(http://www.berlecon.de) and url(Christian Zimmermann)(http://www.er.uqam.ca/nobel/r14160/index.html) for comments on the questionnaires. url(William L. Goffe)(http://wuecon.wustl.edu/~goffe/) and Christian Zimmermann made many helpful suggestions. The topic of link(Subsection ref(sec:upta))(sec:upta) was suggested by url(Jane Greenberg)(http://ils.unc.edu/~janeg). Sophie C. Rigny kindly pointed out many stylistic and grammatical errors in an earlier version. +latexcommand(}))) (Jos\'e Manuel Barrueco Cruz and Thomas Krichel) () latexcommand(\thispagestyle{empty}) latexcommand(\vfill\begin{center}) table(2)(ll)(row(cell(url(Jos\'e Manuel Barrueco Cruz)(http://www.uv.es/~barrueco )) cell(url(Thomas Krichel)(http://gretel.econ.surrey.ac.uk))) row(cell(url(Biblioteca de Ci\`encies Socials dq(Gregori Maians))(http://www.uv.es/econweb/)) cell(url(Department of Economics)(http://www.econ.surrey.ac.uk))) row(cell(url(Universitat de Val\`encia)(http://www.uv.es)) cell(url(University of Surrey)(http://www.surrey.ac.uk))) row(cell(Campus dels Tarongers s/n) cell(Stag Hill)) row(cell( 46071 Val\`encia ) cell( Guildford GU2 5XH )) row(cell( Spain ) cell( United Kingdom)) row(cell(url(jose.barrueco@uv.es)(mailto:jose.barrueco@uv.es)) cell(url(T.Krichel@surrey.ac.uk)(mailto:T.Krichel@surrey.ac.uk))) whenlatex(row(cell(tturl(http://www.uv.es/ +latexcommand(~)barrueco)) cell(tturl(http://gretel.econ.surrey.ac.uk)))) row(cell( ) cell( RePEc:per:1965+endash()06+endash()05:thomas_krichel))) +latexcommand(\end{center}) abs(Cataloging scientific papers creates a new educational resource. Collecting that data is a costly process to achieve and manage. In particular the level of granularity that is required is finer than say for a collection of web sites. One possible approach towards cataloging these resources is to get a commonity of providers involved in cataloging the materials that they provide. This paper introduces RePEc of http://netec.wust.edu/RePEc, as an example for such an approach. RePEc is mainly a catalog of research papers in Economics. It is based on set of over 80 archives which all work independently but yet are interoperable. They together provide data about almost 60,000 preprints and over 10,000 published articles. ) latexcommand(\vfill) Jos\'e Manuel Barrueco Cruz is a librarian at the Universitat de Val\`encia. Thomas Krichel is a lecturer in Economics at the University of Surrey. Both welcome comments on this paper, write to wopec@netec.mcc.ac.uk. whenlatex(This paper is available online at tturl(http://openlib.org/home/krichel/papers/shankari.html).) whenhtml(This paper is url(available in PDF)(http://openlib.org/home/krichel/papers/shankari.pdf).) latexcommand(\vfill) sect(Introduction) Some scientific disciplines have a preprint tradition. Essentially these are Mathematics, Physics, Computer Science and Economics. Preprints are not issued in the same ways across those disciplines. In Mathematics and Physics preprints are essentially issued by individual academics. In Computer Science and Economics, it is more the department that distributes the preprints. In this paper we deal with Economics preprints, usually called working papers. Economics is the dismal science. Its bad reputation is founded on two conceptions. The first is that economists never agree on anything. Winston Churchill claimed dq(If you put two economists in a room, you get two opinions, unless one of them is Lord Keynes, in which case you get three opinions). And on the other side of the pond, President Truman sought to hire a one-armed economist because he would no be able to say dq(on the other hand). The other conception is that Economics is very theoretical to the point of being totally useless. A popular tale is that of the two economists who sit down to play chess. They study the board for 24 hours and eventually declare a stale-mate. Fortunately both of these conceptions do not fully apply to all sections of Economics. There is a large mainstream literature that is based on a common set of principles. It is true that this literature is heavily mathematical but that that does not follow that it is completely useless. There are counterexamples. For example the calculation of option values is important for anybody how is dealing with financial options. Trade in such options has only taken off since a pricing formula has been found. COMMENT( Derivatives research that is carried out by financial engineers within banks, investment houses etc therefore requires the occasional use of economic research work.) Another example are studies relating to competition. These are used by government organizations who work on regulating industries and on anti-trust measures. Economics research documents are therefore useful to a wide variety of people, not only to students. In the past years more and more Economics departments and research institutions have made their working papers available on the Internet. However in that form the papers can only be found by specialists who know who has been working in a certain area, where that researcher is based and whether there are any papers of that researcher available on the web pages. This is the kind of knowledge that is circulated at scientific conference+emdash()usually on the back of business cards and an napkins+emdash()and therefore this data is not available to the people outside the research commonity. The normal mortals will only be able to benefit if a catalog of these papers is available. In this paper, we describe attempts to build a catalog of online and offline working papers in Economics called RePEc. In link(Section ref(sec:repec))(sec:repec) we introduce the concepts behind it. The RePEc is spread over many archives and these are described in link(Section ref(sec:indi))(sec:indi). link(Section ref(sec:data))(sec:data) describes the contents of the dataset. In link(Section ref(sec:data))(sec:data) we review the RePEc dataset. link(Section ref(sec:data))(sec:user) we consider user interfaces to RePEc. link(Section ref(sec:conc))(sec:conc) concludes. sect(RePEc)label(sec:repec) The Electronic dissemination of Economics working papers can be traced back to the start of the Working Papers in Economics (url(WoPEc)(http://netec.mcc.ac.uk/WoPEc.html)) project in April 1993. By May 1999 this single archive has grown into an interconnected network of over 80 archives holding over 14,000 downloadable working papers and over 50,000 descriptions of offline papers from close to 1,000 series. The network of archives is called url(RePEc)(http://netec.wust.edu/RePEc). This term is was initially conceived to stand for dq(Research Papers in Economics). Nowadays it is best understood as a literal, because the objectives of RePEc go way beyond a database of scientific papers. RePEc data is freely available, in the sense that the provider pays for the provision of the data, not the user. In order to make such a system viable without public subsidy, the cost of providing the data must be spread among many agents+footnote(understood here and in the rest of the paper as a person or institution). This requirement has been a feature of RePEc right from the start of the collection in May 1997. Each participating provider sets up an archive on a http or ftp server. The archive supports the storage of structural data about objects relevant to Economics, and possibly the storage of some of the objects themselves. All objects in RePEc are uniquely identified following by handles. RePEc data can be accessed through a plethora of user services. Some are heavily used, for example the dq(url(IDEAS)(http://ideas.uqam.ca)) user service had one million hits in just over 2 moths in 1999. The main interest of this paper is to examine the collection aspect of the data. The idea that a coherent literature catalog can be put together by a large group of people who are physically dispersed and have very little personal commonication without the need of extensive training nor intensive coordination remains to be demonstrated. At the time of writing this paper RePEc is two years old. We feel that this is a good time to review the operations of RePEc and the data that it has collected. Clearly the RePEc data is in a constant state of flux. To keep matters simple we took a dump of the data on 1 May 1999. In this paper we are only referring to the state of the data on that date. There are some aspects of RePEc that this paper does not discuss. We eschew any mentioning of the data on software, books, etc to concentrate on the collection of traditional academic papers be they preprints or published articles. This data forms the bulk of the present collection. We also leave out the personal and institutional data which are is not included in the papers and article templates. We aim to use such data to build a fully relational database system that describes Economics as a discipline. We will report on such efforts in future papers. The nature of RePEc is not precisely defined. Most people think about it as a collection of archives and services that provide data about Economics. More precisely, RePEc is most commonly understood as referring to three things. First it is a collection of archives that provide data about Economics. Second it is the data that is found on these archives. Third, it is often also understood to represent the set of agents who build archives and channel the data from the archives to the users. In that latter sense RePEc has no formal management structure. RePEc has two aims. The dq(cataloging aim) is to provide a complete description of the Economics discipline that is available on the Internet. The dq(publishing aim) is to provide em(free)+latexcommand() access to Economics resources on the Internet. COMMENT(--- Este ejemplo creo que no es claro. Yo lo quitaria pues no aporta nada nuevo, solamente matiza lo anterior a un nivel que puede confundir al lector These aims are sometimes conflicting. For example, let us assume that a certain amount of money is available for cataloging purposes. Then the library objective might be best served by using these funds to gather information about a high-quality toll-gated journal resource, whereas the publishing objective would be better served by considering a collection that is on the Internet and may not be of the same quality since it has not yet been extensively peer-reviewed. RePEc has ambition to become involved in peer-review; however it can be used to support peer review. An initial move into that direction is the NEP project that we will mention again in link(Section ref(sec:user))(sec:user). --- ) The basic principle of RePEc can be summarized as follows center( Many archives +latexcommand( $\Longrightarrow$ ) +htmlcommand( ---> )One dataset+latexcommand( $\Longrightarrow$ ) +htmlcommand( ---> ) Many services ) Basic RePEc concepts are: archive, site and service. itemize( it() An dq(archive) is a space on a public access computer system which makes data available. It is a place where original data enters the system. The is no need to run any software other than an ftp or http daemon that makes the files in the archive available upon request. Each archive is identified by a three-letter code. Some elementary metadata about the archive like its name, its url and some basic contents information are polled by a special central archive with the handle RePEc:all, where dq(RePEc) is the naming authority and dq(all) is the archive code. it()A dq(site) is a collection of archives on the same computer system. It usually consists of a local archive augmented by frequently updated (dq(mirrored)) copies of remote archives. it()A dq(service) is a rendering of RePEc data in a form that is available to the end user. ) All archives hold papers and metadata about papers, as well as software that is useful to maintain archives. Everything contained in an archive may be mirrored. For example, if the full text of a paper is in the archive, it may be mirrored. If the archive does not wish the full text to be mirrored, it can store the papers outside the archive. The advantage of this dq(remote storage) is that the archive maintainer will get a complete set of access logs to the file. The disadvantage is that every request for the file will have to be served from the local archive rather than from the RePEc site that the user is accessing. Of course an archive may also contain data about documents that are exclusively available in print. There is no need for every site to mirror the complete contents of every archive in the system. To conserve disk space and bandwidth some sites only mirror bibliographic information rather than the documents that an archive may contain. Others mirror all the files of an archive. Others may mirror only parts of a few archives. The software that is used to mirror the archive is provided at RePEc:all. It first mirrors the central archive. This software then reads a configuration file and then writes batch calls to the popular dq(url(mirror)(http://sunsite.ic.ac.uk/mirror)) program for ftp and the dq(url(w3mir)(http://www.math.uio.no/~janl/w3mir/)) script for http archives. An obvious way to organize the mirroring process would be to mirror the data of all archives to a central location. This central location would in turn be mirrored to the other RePEc sites. The founders of RePEc did not adopt that solution, because it would be quite vulnerable to mistakes at the central site. Instead each site installs the mirroring software and mirrors dq(on its own), so to speak. Not all of them adopt the same frequency of updating. Many update every night, but a minority only updates every week. It is therefore not known how long it takes for a new item to be propagated through the system. Each service has its own name. A service that is based on mirrored scripts may run on many locations. Within reason, all services are free to use any part of the RePEc data as they see fit. For example a service may only show papers that are available electronically, others may restrict the choice further to act as quality filters. In this way services implement constraints on the data, whether they be availability constraints or quality constraints. The user service infrastructure is quite well developed, we list the most important ones in link(Section ref(sec:user))(sec:user). This distribution via the several user services is undisputedly successful feature of RePEc. It is therefore not given further attention here. sect(The structure of an archive)label(sec:indi) RePEc stands on two pillars. First, an em(attribute):em(value) template metadata format called ReDIF. This acronym stands for em(Re)search em(D)ocumentation em(I)nformation em(F)ormat but it is best understood as a literal. ReDIF defines a number of templates. Each templates describes an object in RePEc. It has a set of allowable fields, mandatory, and some repeatable. The second pillar is the Guildford protocol. It fixes rules how to store ReDIF in an archive. It basically indicates which files may contain which templates. It is possible to deploy ReDIF without using the Guildford protocol. But in the following we will ignore this conceptual distinction, because it is easiest to understand the structure and contents of an archive through an example. This is done in link(Subsection ref(sec:guil))(sec:guil). Therefore we will list files in the way required by the protocol as well as the contents of the file that is in fact written in ReDIF. This is done in link(Subsection ref(sec:guil))(sec:guil). We return to technical aspects of ReDIF in link(Subsection ref(sec:redi))(sec:redi). subsect(The Guildford Protocol)label(sec:guil) RePEc identifies each archive by a simple identifier or handle. Here we look at the archive RePEc:sur which lives at tturl(ftp://www.econ.surrey.ac.uk/pub/RePEc/sur). On the root directory of the archive, there are two mandatory files. The file em(surarch.rdf) contains a single ReDIF archive template. verb(Template-type: ReDIF-Archive 1.0 Name: University of Surrey Economics Department Maintainer-Email: T.Krichel@surrey.ac.uk Description: This archive provides research papers from the Department of Economics of the University of Surrey, in the U.K. URL: ftp://www.econ.surrey.ac.uk/pub/RePEc/sur Homepage: http://www.econ.surrey.ac.uk Handle: RePEc:sur) In this file we find basic information about the archive. The other mandatory file is em(surseri.rdf). It must contain one or more series templates. verb(Template-Type: ReDIF-Series 1.0 Name: Surrey Economics Online Papers Publisher-Name: University of Surrey, Department of Economics Publisher-Homepage: http://www.econ.surrey.ac.uk Maintainer-Name: Thomas Krichel Maintainer-Email: T.Krichel@surrey.ac.uk Handle: RePEc:sur:surrec) These two files are the only mandatory files in the Guildford protocol. If these are the only files present in the archive then all the archive is doing is to reserve the archive and the series codes. All documents have to be in a series. The papers for the series RePEc:sur:surrec are confined to a directory called em(surrec). It may contain files of any type. Any file ending in dq(.rdf) is considered to contain ReDIF templates. Let us consider one of them, em(surrec)/em(surrec9601.rdf)+footnote(We suppress the Abstract: field to conserve space.)+verb(Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Email: T.Krichel@surrey.ac.uk Author-Name: Paul Levine Author-Email: P.Levine@surrey.ac.uk Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: ftp://www.econ.surrey.ac.uk/pub/ RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: 199603 Revision-Date: 199711 Handle: RePEc:sur:surrec:9601) COMMENT( Note that in this example the full text of the paper is located within the archive directory structure itself. Therefore the full text is mirrored together with the bibliographic data. Services can then link to a mirrored copy of the full text. If the URL of the paper would point to a place outside the archive structure, the link to the full text would always point to that location. ) Note that we have two authors here. The dq(Author-WorkPlace-Name) attribute only applies to the second author. We will come discuss this point now. subsect(The ReDIF metadata)label(sec:redi) COMMENT(jmbc(---Tipica pregunta que nos pueden hacer: por que ReDIF y no otro formato mas aceptado como.... el DC? ---)) The ReDIF metadata is mainly an extension of the latexcommand(\citeN{petdeu94publishing}) whenhtml(dq(Publishing Informa-tion on the Internet with Anonymous FTP), an Internet draft that expired March 1, 1995), commonly known as the IAFA templates. In particular it borrows the idea of clusters from the draft quote( There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define dq(clusters) which can then be referred to in a shorthand manner in the actual template definitions. ) ReDIF takes a slightly different approach to clusters. A cluster is a group of fields that jointly describe a repeatable attribute of the resource. This is best understood by an example. A paper may have several authors. For each author we may have several fields that we are interested in, the name, email address, homepage etc. If we have several authors then we have several such groups of attributes. In addition each author may be affiliated with several institutions. Here each institution may be described by several attributes for its name, homepage etc. Thus a nested data structure is required. It is evident that this requirement is best served in a syntax that explicitly allows for it such as XML. However in 1997+emdash()when ReDIF was designed+emdash()XML was not available. We are still convinced that the template syntax is more human readable and easier understood. However the computer can not find which attributes correspond to the same cluster unless some ordering is introduced. We proceed as follows. For each group of arguments that make up a cluster we specify one attribute as the dq(key) attribute. Whenever the key attribute appears a new cluster is supposed to begin. For example if the cluster describes a person then the name is the key. If an dq(author-email) appears without an dq(author-name) preceding it the parsing software aborts the processing of the template . Note that the designation of key attributes is not a feature of ReDIF. It is a feature of the template syntax of ReDIF. It is only the syntax that makes nesting more involved. We do not think that this is an important shortcoming. In fact we believe that the nested structure involving the persons and organizations should not be included in the document templates. What should be done instead is to separate the personal information out of the document templates into separate person templates verb(Template-Type: ReDIF-Person 1.0 Name: Thomas Krichel Email: T.Krichel@surrey.ac.uk Author-Paper: RePEc:sur:surrec:9404 Author-Paper: RePEc:sur:surrec:9601 Homepage: http://gretel.econ.surrey.ac.uk Handle: RePEc:per:1965-06-05:thomas_krichel) We can then replace the author information for the first author in the paper template for +bendtt(RePEc:sur:surrec:9601) by verb(Author-Name: Thomas Krichel Author-Person: RePEc:per:1965-06-05:thomas_krichel) The benefits of such a relational structure are clear. There is a much reduced load on administration of the system. When one element of author data+emdash()+eg()her phone number+emdash()changes, this change has to be registered at only one point in the system. A pervasive use of these relational features will allow the resolution of current author information through the current person template of the author. The user of a RePEc service would therefore find the author of the paper even though the contact information on the paper's title page may no longer be current. We leave the implementation of such systems for future work. sect(The total dataset)label(sec:data) +latexcommand(\begin{table}\begin{center}) table(5)(lrlrl)( row(cell()cells(2)(ReDIF-paper) cells(2)(ReDIF-article) ) row(cell(em(field)) cell(em(all))cell(em(max)) cell(em(all)) cell(em(max)) ) row(cell(template-type)cell(58254)celll(1) cell(10112) celll(1) ) row(cell(handle) cell(58251) celll(2)cell(10110) celll(1) ) row(cell(title) cell(58235) celll(2) cell(10110)celll(1) ) row(cell(author-name) cell(98321) celll(14) cell(13855) celll(6) ) row(cell(creation-date) cell(52730) celll(1) cell(8819)celll(1) ) row(cell(revision-date) cell(536) celll(8) cell() celll() ) row(cell(publication-date) cell()celll() cell(510) celll(1) ) row(cell(abstract) cell(22984) celll(3) cell(1896) celll(1) ) row(cell(classification-jel) cell(20194) celll(2) cell(436) celll(1) ) row(cell(keywords) cell(39219)celll(3) cell(9084) celll(1) ) row(cell(keywords-attent) cell(457) celll(1) cell() celll() ) row(cell(publication-status) cell(6227) celll(3) cell(1568) celll(1) ) row(cell(note)cell(9011) celll(1) cell(1479) celll(2) ) row(cell(series) cell(4124) celll(2) cell()celll() ) row(cell(number) cell(16021) celll(2) cell(1501) celll(1) ) row(cell(price) cell(4175) celll(3) cell() celll() ) row(cell(file-url) cell(17259) celll(22) cell(1853) celll(2) ) row(cell(order-url) cell(2417) celll(1) cell() celll() ) row(cell(contact-email) cell(1141) celll(1) cell() celll() ) row(cell(availability) cell(7169) celll(2) cell() celll() ) row(cell(length)cell(33342) celll(12) cell() celll() ) row(cell(pages) cell() celll() cell(7920) celll(1) ) row(cell(month) cell() celll() cell(489)celll(1) ) row(cell(issue) cell() celll() cell(8705) celll(1) ) row(cell(volume) cell() celll() cell(1293) celll(1)) row(cell(year) cell() celll()cell(1293) celll(1) ) row(cell(journal) cell() celll() cell(488) celll(1) ) row(cell(paper-handle)cell() celll() cell(19) celll(1) ) ) +latexcommand(\caption{The data in article and paper templates} \label{tab:b}\end{center}\end{table}) subsect(Aggregate Contents) In Table 1, we examine the document data in RePEc. For each field we give the total occurrences of the field in the dq(all) column and the maximum of occurrences that the field has within a single template in the dq(max) column. The document data appear in the ReDIF-paper and the ReDIF-article templates. There are two characteristics that potentially set articles apart from papers. First the paper can be understood as a preprint. From that point of view the article is a paper that has gone through some sort of peer review. In that case the distinction between paper and article has to do with the contents only. Secondly the distinction between paper and articles could be through their physical manifestation. From that point of view the article would be a document that is bound with others in a journal issue and it would therefore carry page numbers, issue numbers etc. This is the official criterion according to the ReDIF documentation. But it is not neat since the pagination may become redundant if the journal becomes electronic. In the following we will use the term dq(document) when we wish to refer to papers and articles simultaneously. Total numbers for documents are given by the dq(template-type) and dq(handle) fields. Since each template should have exactly one type and exactly one handle the tiny difference between the two numbers is made up of mistakes in the dataset. The title field is also required. It is encouraging to see that most documents have a creation date attached to them, because as the dataset grows it will become increasingly important to distinguish between recent and dated documents; only the former are likely to be of much interest. By contrast dq(revision-date) information is rare. Articles may also have a dq(publication-date). The difference of this field with the dq(creation-date) field is not clear. We consider this to be a design error in the template structure. COMMENT(--- Yo lo quitaria. publication-date no esta en la documentacion de ReDIF. Puede ser un error en redif.spec??) Let us consider the elements that refine the contents description. We encourage contributors to provide abstracts. The presence of abstracts for about one in three papers is very positive. The abstract field can be repeated. This is desirable when there are abstracts in different languages. A large number of the papers have a url(Journal of Economic Literature)(http://www.jel.org) (JEL) classification code attached to them. However almost all papers in the offline papers only archive RePEc:fth have the codes and that explains a very large proportion of the classified material. Note that this data has been compiled by a librarian. For the electronic papers there are only two in five papers that have a classification field. We agree that this is a serious limitation to the quality of the data. It would have been possible to require a classification number for each paper right from the start. This would have hampered the collection effort. In particular it would have made it impossible for the WoPEc team to dq(snarf) bibliographic data from sites where this JEL data was not available. There is also some concern among economists that their areas of work do not match with these codes. The use of more complete and sophisticated classification schemes would not be possible. The main argument against requiring JEL classification codes was, however, that there is considerable opposition against the scheme in the heterodox Econonomics commonity. They feel that the JEL classification scheme reflects the view of the orthodoxy. Requiring JEL classification codes would have meant excluding these contributors. Then and now only a tiny part of the collection could be grouped as heterodox. However our aim is that RePEc be a broad church. This was the decisive argument against requiring the use of JEL codes. There is a large number of templates that have keywords. About 50% of these templates come from RePEc:fip where each paper has a keyword. ReDIF allows for both free and qualified controlled vocabulary. This facility is used by for the internal keyword scheme of the url(Attent: Research Memoranda)(http://cwis.kub.nl/~dbi/english/info/attent.htm) database. They are only used by the RePEc:dgr archive. COMMENT(--- Yo lo quitaria. publication-date no esta en la documentacion de ReDIF. Puede ser un error en redif.spec??) The dq(publication-status) field can be used to indicate where the paper has been submitted to and where the paper has been formally published. This field appears in the data from large research bodies that have been issuing a series of papers for many years and that have data about the formal publication of the paper. The fields dq(series) and dq(number) are somewhat redundant since this information should also be available from the handle. The dq(price) field normally refers to the delivery of a printed copy. The mode of delivery is often just expressed in the dq(price) field. The dq(file-url) field refers to the dq(full text) locus of a part of the full text. Usually it is the complete full text. The document may have several components in addition to the full text. These can be listed as several dq(file) clusters. Each may carry an uncontrolled field about its function within the paper. For example the author may wish to supply a computer program that was used to produce the paper. In that case a whole series of files may be made available. However that is not the way the option of having many files is actually exercised. Most of the time it is used to include elements like graphics or tables that the author did not manage to include into the main document file. The dq(order-url) field is used to point to an intermediate page that sits between our description and the files of the document. In that case we are not aware if the resource does actually exist online. dq(order-url) may be used in conjunction with the dq(file-url) attribute. Note that there is no dq(order-email) field in the document templates. Such a field figures in the series template, because the ordering of a paper should be the same for all papers in the series. The dq(contact-email) may otherwise be used to contact the somebody who has any connection with the paper. This field is only used by the contributors to the RePEc:wpa archive. The dq(availability) is used most of the time to signal that the paper is no longer in print. Finally a dq(length) attribute can be used to indicate how many pages the reader has to go through to read the paper. This field is present in all templates provided by RePEc:fth and it seems to appear in a surprisingly large number of other templates. Articles have a number of specific attributes that are listed at the bottom of the table. Strictly speaking these are not descriptive elements of the articles themselves, they rather relate to the position the article has within the journal. Finally the dq(paper-handle) allows to point from the preprint version to the article template. +latexcommand(\begin{table}\begin{center}) table(9)(lrllrllrl)( row(cells(3)(file)cells(3)(person)cells(3)(organization)) row(celll(em(name))cellr(em(all))celll(em(max)) celll(em(name))cellr(em(all))celll(em(max))celll(em(name)) cellr(em(all))celll(em(max)) ) row(celll(url)cellr(19112)celll(1) celll(name)cellr(112176)celll(1) celll(name)cellr(8598)celll(1) ) row(celll(format)cellr(19024)celll(1) celll(postal)cellr(8)celll(1) celll(postal)cellr(2118)celll(2) ) row(celll(size)cellr(2630)celll(1) celll(homepage)cellr(1557)celll(2) celll(homepage)cellr(596)celll(2) ) row(celll(function)cellr(1661)celll(1) celll(email)cellr(3166)celll(2) celll(email)cellr(1451)celll(3) ) row(celll(restriction)cellr(2548)celll(1) celll(phone)cellr(282)celll(1) celll(phone)cellr(164)celll(1) ) row(celll()cellr()celll() celll(fax)cellr(259)celll(1) celll(fax)cellr(197)celll(1) ) row(celll()cellr()celll() celll(workplace-name)cellr(8598)celll(4) celll()cellr()celll() ))+latexcommand(\caption{The data in clusters} \label{tab:g}\end{center}\end{table}) subsect(The clustered data) The data available in Table 1 is not the complete set of information available in the dataset. It only lists the individual attributes and the key attributes of clusters in the paper and article templates. In Table 2 we have the data that is contained in the clusters in this subset of the RePEc data. This data is therefore consistent with the data in Table 1. There are three types of clusters, dq(file), dq(organization) and dq(person). The numbers that are present suggest that there are significant possibilities for a relational structure in the dataset between persons and their organizations. An interesting consideration in the person cluster is the high number of workplace templates. Providers of the data seem to attribute more importance to the workplace of a person rather than to her strictly personal data, eg()her homepage. The only explanation that we can offer here is that most likely the data is provided by an agent of the workplace. The low number of homepages is an indicator which also suggests that in most cases the provider is not the author herself. Note also that the workplace information+emdash()when it is present+emdash()is much more complete than the corresponding data for the individuals. sect(User services)+label(sec:user) There would be little point in collecting all that data if there were no users to use them. Note that there is no official user service for RePEc. The implicit ability and explicit intention to allow for many user services at one time is a key features of RePEc. This provides an important selling point once a potential provider understands that submitting data to RePEc means submitting the data to all the user services at once. Here we list the most important user services in link(Subsection ref(sec:main))(sec:main), before we critically discuss them in link(Subsection ref(sec:upta))(sec:upta). subsect(The main user services)label(sec:main) By order of historical appearance, they are furl(BibEc)(http://netec.mcc.ac.uk/BibEc.html)+amp() +latexcommand(\par )+furl(WoPEc)(http://netec.mcc.ac.uk/WoPEc.html)+nl() +noindent() provide static html pages for all working papers that are only available in print (BibEc) and all papers that are available electronically (WoPEc). Both datasets use the same search engines. There are three search engines, a full text WAIS engine, a fielded search engine based on the mySQL relational database and a ROADS fielded search engine. Note that the mySQL database is also used for the control of the relational components in the RePEc dataset. BibEc and WoPEc are mirrored in the United States and Japan as part of the NetEc project. furl(IDEAS)(http://ideas.uqam.ca)+nl() +noindent() provides an Excite index of static html pages that represent all Paper, Article and Software templates. This is by far the most popular RePEc user interface. furl(NEP: New Economics Papers)(http://netec.wustl.edu/NEP)+nl() +noindent() is set of reports on new additions of papers to RePEc. Each report is edited by subject specialists who receive information on all new additions and then filter out the papers that are relevant to the subject of the report. These subject specialists are PhD students and young researchers. They work as volunteers. On 27 June 1999 there were 1766 different email addresses that subscribed to at least one list. furl(Tilburg University Working papers and research memoranda)(http://www.kub.nl/~dbi/demomate/repref.htm)+nl() This site also operates a Z39.50 server for all downloadable papers in RePEc is available at dbiref.kub.nl:9997. The database name is dq(repref). The attribute set is Bib-1, and the record syntax supported are USmarc, SUTRS, GRS-1 (only string tags, tag type 3). furl(RuPEc)(http://www.ieie.nsc.ru/RuPEc)+nl() is a server in Russian. It does not only provide search facilities for Russian users but also archival facilities for Russian contributors. furl(INOMICS)(http://www.inomics.com/query/search)+nl() not only provides an index of RePEc data but also allows simultaneous searches in indexes of other web pages related to Economics. The dq(Tilburg University Working papers and research memoranda) service is operated by a library-based group that has received funding from the European Union. INOMICS is operated by the Economics consultancy url(Berlecon)(http://www.berlecon.de). All the other user services are operated by junior academics. subsect(The usage of user services)label(sec:upta) Thomas Krichel founded both the WoPEc user service in 1993 and NEP in 1998. Jos\'e Manuel Barrueco has been the intensively involved in WoPEc user education. Our experience suggests that the average users from developed countries are at the postgraduate and doctoral level. There are many users in developing countries. In these countries the user commonity includes more senior levels, ie()more junior academics and professional researchers rather than students. For them the RePEc user services are one of the very few means to get hold of research papers. We think that this is the most rewarding aspect of our work. The free provision of RePEc helps to reduce the gap between the informationally rich and the informationally poor. The use of RePEc services among senior academics in the developed countries seems to be low. Is this because these people are too much set in their ways to use these modern facilities? We do not think so. Some people think that the low usage by tenured academics We believe that the current user services do not meet the information needs of these people. Academics do not need large-scale information services that they can search. The larger the scale the more likely they are to find information they did not seek and the less likely they are to find information that they want. Since they are working within a very narrow field and only have little time to read a small amount of literature small-scale information services are more tailored to their needs. In addition the contents of the service should be highly selective. Among the current user services that are built on the RePEc data, NEP comes closest to such services. Our anecdotal evidence suggests that this is the service that has the largest proportion of tenured academics. RePEc as such can not provide small-scale user services. It can only provide the basis for such user services to exist. We are aware of two approaches to build such services. Section 4 of latexcommand(\citeN{serpar99online}) whenhtml(url(Krichel, Lyapunov and Parinov (1999))(http://gretel.econ.surrey.ac.uk/papers/zhenya.pdf)) describes design features for a current awareness portal system where each researcher could register the subject and type of records that she is interested in. The portal would then be able to inform the researcher about new resources in her field. A second approach is outlined is Section 6 of latexcommand(\citeN{kribau99edel}) whenhtml(url(Baum and Krichel (1999))(http://netec.mcc.ac.uk/AcMeS/edel.html)). Here the idea is to build peer review web (dq(SurWeb)) services. These are supposed to extend NEP to full peer review. It is too early to speculate if such a system can be put into place. sect(Conclusions)label(sec:conc) The free provision of educational material can be implemented through a central institution. Such an institution needs to be subsidized by central funds. The alternative is to provide the resources by a large number of agents. Then the cost of providing access can be absorbed within each institution. In that case the question of a comprehensive catalog arises. Such a catalog is needed to provide access to the collection in a unified way. In this paper we have dealt with the provision of a key resource ie()academic papers. We have presented a collection of metadata that is provided by decentralized archives. We have found that it is possible to build such a collection to a reasonable degree of accuracy if some archives where mistakes occur are aided by others. There needs to be a small group of people who actively support the collection. However this support can be given in decentralized fashion without the need for much coordination between supporters. The academic library commonity in the United Kingdom as a whole has made a important contribution to RePEc by donating funds to the work of the WoPEc project. This has allowed the WoPEc project to collect metadata about papers that are published by institutions that are not yet contributing to RePEc. This was a vital aspect of WoPEc project. The data collected by WoPEc constituted 90% of the RePEc data when RePEc was founded. However nowadays that proportion is falling. The funding for WoPEc has run out but the WoPEc web site continues to expand because of the contributions by made by RePEc archives. The software is maintained by volunteers. Librarians should carefully consider the vision of the project. This is a kind of academic self-organization where academics publish and catalog their own work. RePEc benefits from network externalities. The more academics join the more those who have not joined will feel pressure to join. If the data is freely available than authors can commonicate with their peers without the need of intermediaries. The providers of intermediation services have every reason to be worried. They include publishers em(and)+latexcommand(\/) librarians. If librarians do not play a more active part by supporting developments like RePEc there will be no more r\^ole for them in the future. Write to url(RePEc@netec.mcc.ac.uk)(mailto:repec@netec.mcc.ac.uk). latexcommand(\bibliography{bib}) whenhtml(htmlcommand(

) The work discussed here has received financial support by the Joint Information Systems Committee of the UK Higher Education Funding Councils through its url(Electronic Library Programme)(http://www.ukoln.ac.uk/services/elib). We are grateful to url(Christopher F. Baum)(http://fmwww.bc.edu/EC-V/Baum.fac.html), url(Robert P. Parks)(http://wueconb.wustl.edu/~bob/), url(Thorsten Wichmann)(http://www.berlecon.de) and url(Christian Zimmermann)(http://www.er.uqam.ca/nobel/r14160/index.html) for comments on the questionnaires. The topic of link(Subsection ref(sec:upta))(sec:upta) was suggested by url(Jane Greenberg)(http://ils.unc.edu/~janeg). url(William L. Goffe)(http://wuecon.wustl.edu/~goffe/) and Christian Zimmermann made many helpful suggestions. Sophie C. Rigny kindly pointed out many stylistic and grammatical errors in an earlier version. ) COMMENT( htmlcommand(

Appendix: The composition of the dataset by archive

) latexcommand(\appendix%\setcounter{table}{4}) There is a central archive RePEc:all, that mirrors all the em(???arch.rdf) and em(???seri.rdf) files from all archives. It also contains the software that allows sites to mirror archives. RePEc:all also provides reading and checking software for templates as well as general RePEc documentation. This archive lives at tturl(ftp://netec.mcc.ac.uk/pub/RePEc/all). ) COMMENT( LocalWords: notableofcontents whenlatex latexcommand otnote Thorsten Wichmann LocalWords: Zimmermann thispagestyle vfill ll Tarongers tturl barrueco endash LocalWords: abs interoperable dq ref indi soci conc Longrightarrow gt guil lr LocalWords: htmlcommand redi subsect citeN petdeu homepage XML emdash WUStL LocalWords: allowbreak Shuetrim Parinov cont loos celll cellr wpa wuk hhb fth LocalWords: fip bbk cpr nbr ecm val apr boc nos cre bru tex tcd subsubsect ie LocalWords: WoBa sunkar whenhtml Fethy's suppo Kurmanov farty bendtt rech jel LocalWords: rela URLs admin webmaster lrlrl attent dgr lrllrllrl attr templ LocalWords: redif eval fileformat regex RFC yyyy dd longtable lrlrlrlrlrl aah LocalWords: cellsl anu bca bej Berlecon ber boe bon bro caf ccd Centre cep Co LocalWords: CEPII cii cje Levine's cla CEPREMAP cpm cty dal UPV EHU ehu Eni LocalWords: Fondazione Mattei fem fmg Universitaet Wirtschaftswissenschaften LocalWords: fra gla hwe IGIER Innocenzo Gasparini igi Jena jen CoFE knz lec LocalWords: Leicester Disccussion Ingber lei Universite Laval Departement lvl LocalWords: d'economique NUIM mce mcl McMaster mcm Universidad Publica UPNA LocalWords: Navarra nav Nir Dagan Volij nid Netnomics nnm nsr SUNY Oswego nyo LocalWords: osu Cramton pcc qeh sbu CSEF sef Li ge Econonomics SEII sei sie LocalWords: Siegen sus tor UPO Scienze Politiche uca uia ukc umd Pompeu Fabra LocalWords: upf vic Globalisation Regionalisation CSGR wck SFB xrs yor Brunel LocalWords: Birkbeck Universtiy CREFE dur ESRC rba Facultad Ciencias noindent LocalWords: Economicas Tilburg RuPEc upta serpar Lyapunov kribau edel SurWeb LocalWords: coar Haworth mySQL consultancy informationally Programme hr appe LocalWords: tcounter tlength tabcolsep eal txt MSdos heterodox co Este creo LocalWords: ejemplo que es claro lo quitaria pues aporta nada nuevo solamente LocalWords: matiza un nivel puede confundir lector Por adir aqui grafico mas LocalWords: Quedaria mucho Si quieres tenemos tiempo puedo practicar poco Una LocalWords: vez hemos dise uno podemos aplicarlo toda documentacion jmbc por LocalWords: Tipica pregunta pueden hacer otro formato aceptado como DC eg si LocalWords: necesaria lista involveh Qu ebec esta frase da impresion archivos LocalWords: obra una institucion debe garantizar su permanencia pero verdad LocalWords: indicador seria cuando dejado actualizarse desde hace Quizas otra LocalWords: habria notar falta bibliotecarios enfatizar catalogacion ahora yo LocalWords: manos los autores distribuidores anecdota Aunque muy para unos UK LocalWords: significativa entenderla necesitan conocimientos seguramente aun LocalWords: tiene encuantra ninguna siga leyendo articulo ser Rigny Mili's de LocalWords: ole INCLUDEFILE shankari mac url RePEc Baum sec Goffe Jos Krichel Ci LocalWords: Biblioteca encies Socials Gregori Maians Universitat encia dels LocalWords: Guildford GU XH thomas krichel PDF WoPEc downloadable offline ftp LocalWords: repec http al NEP metadata mir undisputedly ReDIF ocumentation se LocalWords: nformation ormat sur surarch rdf surseri surrec WorkPlace pdf hhs LocalWords: Karlsson IAFA Valencia Sune Montr EconWPA xxx Fethy Mili perl nl LocalWords: Universit homepages seri BibEc html NetEc repref USmarc SUTRS GRS LocalWords: INOMICS WoBA WAIS Heriot )