Working towards an Open Library for Economics: The RePEc project </td><td>

1: The main user services

I list them by order of historical appearance.

BibEc& WoPEc
provide static html pages for all working papers that are only available in print (BibEc) and all papers that are available electronically (WoPEc). Both datasets use the same search engines. There are three search engines, a full-text WAIS engine, a fielded search engine based on the mySQL relational database and a ROADS fielded search engine. The mySQL database is also used for the control of the relational components in the RePEc dataset. BibEc and WoPEc are based at Manchester Computing in Japan and the United States.

EDIRC
provides a Web pages that represent the complete institutional information in RePEc.

IDEAS
provides an Excite index of static html pages that represent all Paper, Article and Software templates. This is by far the most popular RePEc user interface.

NEP: New Economics Papers
is set of reports on new additions of papers to RePEc. Each report is edited by subject specialists who receive information on all new additions and then filter out the papers that are relevant to the subject of the report. These subject specialists are PhD students and junior researchers. They work as volunteers. On 14 March 2000, there are 2753 different email addresses that subscribe to at least one list.

Tilburg University working papers & research memoranda
This site also operates a Z39.50 server for all downloadable papers in RePEc is available at dbiref.kub.nl:9997. The name of the database is "repref". The attribute set is Bib-1, and the record syntax supported are USmarc, SUTRS, GRS-1 (only string tags, tag type 3).

RuPEc
is a server in Russian. It offers search facilities to Russian users. Its maintainers also provide archival facilities for Russian contributors.

INOMICS
not only provides an index of RePEc data but also allows simultaneous searches in indexes of other Web pages related to Economics.

HoPEc
provides a personal registration service for authors and allows to search for personal data.

The "Tilburg University working papers & research memoranda" service is operated by a library-based group that has received funding from the European Union. INOMICS is operated by the Economics consultancy Berlecon Research. All the other user services are operated by junior academics.

2: The ReDIF metadata format

The ReDIF metadata format is inspired by Deutsch et alii commonly known as the IAFA templates. In particular it borrows the idea of clusters from the draft

There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define "clusters" which can then be referred to in a shorthand manner in the actual template definitions.

ReDIF takes a slightly different approach to clusters. A cluster is a group of fields that jointly describe a repeatable attribute of the resource. This is best understood by an example. A paper may have several authors. For each author we may have several fields that we are interested in, the name, email address, homepage etc. If we have several authors then we have several such groups of attributes. In addition each author may be affiliated with several institutions. Here each institution may be described by several attributes for its name, homepage etc. Thus a nested data structure is required. It is evident that this requirement is best served in a syntax that explicitly allows for it, such as XML. However in 1997--when ReDIF was designed--XML was not available. We are still convinced that the template syntax is more human readable and easier understood. However the computer can not find which attributes correspond to the same cluster unless some ordering is introduced. Therefore we proceed as follows. For each group of arguments that make up a cluster we specify one attribute as the "key" attribute. Whenever the key attribute appears a new cluster is supposed to begin. For example if the cluster describes a person then the name is the key. If an "author-email" appears without an "author-name" preceding it the parsing software aborts the processing of the template.

Note that the designation of key attributes is not a feature of ReDIF. It is a feature of the template syntax of ReDIF. It is only the syntax that makes nesting more involved. I do not think that this is an important shortcoming. Instead I believe that the nested structure involving the persons and organizations should not be included in the document templates. What should be done instead is to separate the personal information out of the document templates into separate person templates. This approach is discussed extensively in the main body of the paper.

The examples of Subsection 3.1 emphasizes that ReDIF is not just a format to encode preprints or other online publications. Of course it is suitable for that, but it objects of description is far broader. The ReDIF vocabulary as currently defined aims the output aspects of an academic discipline. These include the resources (digital or digitisable objects) that it creates, the creators of these resources as well as the institutions that support the creation process. ReDIF allows to describe things in the world that are important to the work of an academic discipline. We will use the word "item" to refer to any such things. ReDIF describes three classes of items: resources, bodies and collections. A "resource" is something that is digital or can be digitized. For example, a book is a resource. Resource templates include ReDIF-paper and ReDIF-article. A "body" is something that is neither digital nor can be digitized. Body templates include ReDIF-person and ReDIF-institution. A "collection" is something that has a manifold nature that makes it difficult to say if it is digitisable or not. ReDIF-archive and ReDIF-series belong to the collection category.

As far as version 1 of ReDIF is concerned, the grouping of record types is not important. The record type structure is in fact a flat finite list of templates types. This is the main limitation of the current version of ReDIF. It is hoped that the version 2 of ReDIF will object oriented. Record types will be classes. For example a preprint record type will be defined as subclass of a document record type. Thus commonities using ReDIF will be able to refine existing types as they wish. ReDIF records will be syntax independent. Version 2 of ReDIF is an important area of further research.

ReDIF is a metadata format that comes with tools to make it easy to use in a framework where the metadata is harvested. A file that is simply harvested from a computer system could contain any type of digital content. Therefore the harvested data must be parsed by a special software that filters the data. This task in accomplished by the rr.pm module written by Ivan V. Kurmanov. It parses ReDIF data and validates its syntax. For example, any date within ReDIF has to be of the ISO8601 form yyyy-mm-dd. A date like "14 Juillet 1789" would not be recognized by the ReDIF reading software and not passed on to application software that a service provider would use.

The rr.pm software uses a formal syntax specification redif.spec. This formal specification is itself encoded in a purpose-built format code-named spefor. Therefore it is possible for ReDIF-using commonities to change the syntax restrictions or even design a whole new ReDIF tag vocabulary metadata vocabulary from scratch.