This is the version of 2016–02-25. We still have the year 2000 version available, as well as the 2016–02–22 version and the 2016–02–21 version
This document is the Guildford protocol. It is named after the town where it was first written. The protocol provides a set of rules for the publication and exchange of documents on the Internet. It could be implemented in any group that wishes to distribute documents on the Internet.
The idea behind the protocol goes back to a statement by William L. Goffe. On 15 July 1995, he wrote on the (now defunct) NetEc-admin list:
What I would suggest is this: a distributed system with any number of sites, each mirroring each other. It would have extensive bibliographic functions (cross-referencing, etc.), and my favorite, digital timestamps for when the papers were put up. For archives outside it, papers could be listed, but no cross-referencing. But, such archives could “join” the system (say it was written in perl so could run on NT as well as Unix). Then you'd have the best of both worlds: distributed, anybody could join, extensive cross-referencing, the whole works. Such a system could easily grow with the profession's use of the net. Such a system would GREATLY benefit the profession.
The way to achieve this “global and local” archive is through a comprehensive distribution process that is based on a set of archives. An archive is based on a machine that makes data available. It is a place where original data enters the system. The data are then distributed to any number of sites. A site is a collection of archives on the same computer system. It usually consists of a local archive augmented by frequently updated copies of remote archives. The local archive is maintained on the local computer, whereas the remote archives are maintained on other computers. We call a frequently updated copy of one archive on a remote site a “mirror”. There is no need for every site will need to mirror every archive in the system. Some may only mirror bibliographic information rather than the papers to conserve disk space. Others may mirror all the files of an archive. Others will mirror only parts of a few archives.
All archives hold papers and metadata about papers, as well as software that is useful to maintain archives. Everything contained in an archive may be mirrored. For example, if the full text of a paper is in the archive, it may be mirrored. If the archive does not wish the full text to be mirrored, it can store the papers outside the archive, for example on a directory that does not belong to the archive.
The Guildford protocol aims to find a set of minimal restrictions on archives such that a global and local system will work. A second key aim of this document is to provide a set of rules such that if they are followed locally, require almost no central effort. However a small amount of work has to be provided by a central archive. This archive is called the core archive. It contains ReDIF templates that describe all known archives in the system.
A limitation of this document is that it will not deal with charging money for metadata. We assume that the description of documents is free. Limiting the access to the documents themselves is possible but remains outside of the scope of the document.
A second limitation is that the protocol does not deal with archiving and preservation issues. A key feature of the protocol is that each document has exactly one home site. If the home site withdraws the document it is withdrawn (after a short delay) on all sites that maintain copies of the document.
A third limitation is that it applies only to archives that contain series of documents. It is not intended to apply to homepage style publications.
The last limitation is that the protocol is not concerned with providing end user services. For example the protocol does not provide any ideas on how to present documents on a web server, on how documents should be indexed etc. However one of the key features of the protocol is that software used to perform these task can be written by a community of contributors and distributed among sites for the benefits of everybody.
The “authority” is a group of people that have come together to implement the Guildford protocol on a set of documents and metadata. A list of authorities, and some information about them is in section 4.
“ReDIF (Research Document Information Format)” is a set of rules to encode information about papers, series, and copying rights. It is discipline independent and independent of an organisational structure that supports its creation and deployment.
A “series” is a collection of documents that are kept together.
An “archive” is a directory on a computer that is open to access by ftp, http, or https. It holds a collection of series of papers or a collection of data about papers held elsewhere.
A “RePEc archive” carries data formated in ReDIF that pertain to Economics and use RePEc as administrator. It may also carry the full text of the documents.
A “site” is a collection of (normally one) local archive plus any number of mirrored archives. For the purpose of identification, sites and archives are treated identical. Normally each site runs on archive and mirrors several others.
The “core archive” is a single machine on which a limited number of important files are kept. These include the documentation of the protocol, including the regulations of the decision making process, brief descriptions of all the software contributed to the organisation and the core templates (see the ReDIF documentation) of all participating archives. The contents of the core site is made available on all other sites.
The “administrator” is the person who keeps the core files on the core site.
“mirroring” is a process by which copies of series of documents are made from one site to another, such that the contents of the archive is the same on all sites except for a short delay that is inevitable.
Every archive has an identifier. The convention is that all archive identifiers have three letters. Archives store document metadata and, optionally, documents. Each document belongs into a series. Series identifiers must have 6 letters or numbers. The protocol uses some reserved words. They have four letters. Archive identifiers are awarded by the administrator. The series identifiers are fixed by the archives in consultation with the administrator. The reserved words are those mentioned in the protocol.
An archive contains a set of files. All names of files are
case-insensitive. All files ending with the extension
.redif
are files that contain ReDIF
templates. These files are assumed to be in the UTF-8 encoding
of Unicode, unless they contain a byte order mark indicating
a UTF-16 encoding. ReDIF templates
may also be stored in files with the extension
.rdf
. Such files are
assumed to contain ReDIF templates. But they are read in a different
way. They are
assumed to be in the Windows-1252 encoding unless they contain a
byte order mark indicating either a UTF-8 or UTF-16
encoding. Usage of such .rdf
files became
obsolete on 1 March 2016.
To improve readablity of this document, we only use the .redif
extension.
./
archive_identifier/
archive_identifierarch.redif
a file describing the archive using a single ReDIF archive
template. This is mandatory.
For any archive, the collection of the contents of
./
archive_identifier/
archive_identifierarch.redif
and
./
archive_identifier/
archive_identifierseri.redif
are called its core templates. They are mirrored on the
core site.
./
archive_identifier/
archive_identifieraseri.redif
a file describing the series in the archive using a sequence of
ReDIF series templates. All series in the archive are described
in this file, one template for each series. This is mandatory.
./
archive_identifier/
series_identifier/
a directory where papers and metadata for the series that is
identified by series_identifier are stored. All
files that that pertain to the series
series_identifier must be stored in that directory.
Files that contain ReDIF information are
called ReDIF files. Their names must end in .redif
or—but deprecated—.rdf
. Subdirectories
of the series directory are not supported.
Otherwise
the structure of the directory is free.
You may put all
templates for all papers in the series in one file or you may
put each paper template in a different file, just do as you
please. It is good practice to start each ReDIF file in the
directory with the series_identifier.
.
/archive_identifier/
inst/
a place to store ReDIF files that contain institution templates.
See the ReDIF draft for further information
about that template.
.
/archive_identifier/
soft/
is for
software that is written locally. For example, an archive may
wish to write a specific procedure by which its ReDIF-Paper data is
translated into html.
.
/archive_identifier/
pers/
is a directory for person templates.
.
/archive_identifier/
conf/
is the location of configuration files for
software. It does not matter whether the software is supplied by the
local archive or a remote archive.
In order to mirror a http or https based archive, all directories used by the archive must be indexable.
When you change the location of an archive, place the new location of the archive in the copy of the archive template in both the old and the new location. If you do not have write access to the old location, you have to notify the administrator.
Each site may mirror a number of series from any number of
archives. If any site mirrors any series for an archive, it may
mirror the complete subdirectories of the series or all ReDIF
files (ending with .redif
or—obsolete—with .rdf
) of the series. If
an archive does not wish the papers to be mirrored, then it will
store them outside of this hierarchy.
> “RePEc” is an authority that supports the creation and deployment of ReDIF data in economics according to the Guildford protocol. For the rest of the document, we will use the name “RePEc” to refer to the authority.
The RePEc core archive has the handle “all”. It operates at ftp://all.repec.org/RePEc/. RePEc archives maintainers are free to choose the series identifiers that they want.