This is the Brisbane Protocol proposal. Thomas Krichel started to work on it on 2007–07–03.
Other, obsolete versions are
This proposal is motivated to regulate the integration of documents furnished by the institutional repositories (IRs) into RePEc.
This process has been discussed in terms of confused terminology and logic. This document seeks to clarify.
The terms of discussion here abstract from RePEc. The protocol is aimed to be generic to all sorts of subject-based collections that would like to include contents from IRs, because a standard procedure to incorporate IR contents in subject-based aggregator appears to be useful. Thomas Krichel is not aware of a standard set the RePEc community could readily adopt.
A subject-based collection (henceforth: subac) holds a collection of metadata of interest to some group.
The provenance of documents is a key issue in a subac. A subac must have a list of sources, from which it compiles its own divisions. Here a source is something like an IR for example, that supports its own metadata rules, collection principles and metadata exposure protocols. A division is something that is the format expected by the subac and that supports and identification scheme that the subac can do something with.
In addition, any subac known to man makes a logical partitioning of documents into series. Usually series are partitions of divisions.
An IR make documents available. Any such document is called an ird in the following.
Every ird has a mode. There are two mode “internal” and “external”.
An IR may group its documents into different series. Then, a subac will group the IR into a division, and series of the IR into series of the division that represents the IR.
An ird is internal if it is the definitive version of a document in its series. An ird is external otherwise.
A couple of examples may help to illustrate this. Assume an author publishes an article in a journal. She uploads a copy of this article into the IR. This is an external ird. On the other hand a department in an institution may publish a series of reports. It may delegate the technical infrastructure of this process to the IR. In that case all ird that belong to the published series are internal irds.
External ird can have two presentations. These presentations are called freestanding and boundstanding.
In a freestanding representation, as made in a user interface, an extrenal ird can be presented on its own. On the page of the external ird say it is part of the papers uploaded in the IR, under the series that the subac considers appropriate for it. The ird may be related to a document in the subac.
In a boundstanding representation, the ird is part of another subac record. As far as the subac is concerned, it has no representation of its own. The metadata from the boundstanding external ird has to be related to a record in the subac to be meaningful.
We assume that the IR implements version 2.0 of the OAI protocol for Public Metadata Harvesting, henceforth OAI-PMH.
If a source needs to make records available to a subac, it places them in an OAI-PMH set with the same name as the identifier of a division or series with the subac. If that identifier can not be used, it is URI encoded. Therefore when a subac harvesting agent accesses a collection from the IR, it will first use the id of the collection as the subac knows it, if this fails to retrieve any record, it will use the URI encoding of the identifier.
All metadata uses the AMF format.
oai identifiers, as used in the OAI-PHM requests and responses, must be identical to the AMF record identifier, if such an identifier is provided.
Internal irds are made available using conventional AMF notation.
Every record in a set must have an identifier that start with the set identifier. Thus, a record for an internal ird can only be in one set.
A boundstanding external ird is simply represented by it's external handle. In this case, the metadata supplied concerns solely the files provided by the IR.
<amf xmlns="http://amf.openlib.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://amf.openlib.org
http://amf.openlib.org/2001/amf.xsd">
<text id="RePEc:ner:irsenr">
<file>
<!-- full text data -->
<url>
...</url>
<function>
...</function>
<format>
...</format>
<restriction>
...</restriction>
</file>
<isversionof>
<text ref="RePEc:sur:surrec:9801"/>
</isversionof>
</text>
</amf>
Here, the target document is represented by its handle in the
subac only. Note the usage of ref=
, rather than
id=
.
However, given a potential volatility of handles within the subac's
handle structure, it is better to populate the record with metadata
that is already in the subac at the time the item is captured. Such
data can be gathered with the subac's oai interface, if such an
interface exists. In RePEc's case, records would be found with
http://oai.repec.openlib.org?verb=GetRecord&metadataPrefix=amf&identifier=handle
where handle
in the handle
of the item described. From that response
/OAI-PMH/metadata/amf/text has to be found, and
/OAI-PMH/metadata/amf/text@id has to be replaced
with /OAI-PMH/metadata/amf/text@ref.
<amf xmlns="http://amf.openlib.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://amf.openlib.org
http://amf.openlib.org/2001/amf.xsd">
<text id="RePEc:ner:irsenr">
<file>
<!-- full text data -->
<url>
...</url>
<function>
...</function>
<format>
...</format>
<restriction>
...</restriction>
</file>
<isversionof>
<text ref="RePEc:sur:surrec:9801">
<type>preprint</type>
<title>Growing at Different Rates</title>
<abstract>We examine a two country world.
....</abstract>
<date event="created">1998-04</date>
<classification xsi:type="jel1991">E62 H54 F43</classification>
<file>
<url>http://www.econ.surrey.ac.uk/discussion_papers/RePEc/sur/surrec/surrec9801.pdf</url>
<format>application/pdf</format>
</file>
<hasauthor>
<person ref="RePEc:per:1965-06-05:THOMAS_KRICHEL">
<name>Thomas Krichel</name>
<email>krichel@openlib.org</email>
</person>
</hasauthor>
</text>
</isversionof>
</text>
</amf>
If there is on subac record known for the item, the
data can be added in an anonymous AMF noun, which
does not carry an id
attribute.
<amf xmlns="http://amf.openlib.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://amf.openlib.org
http://amf.openlib.org/2001/amf.xsd">
<text id="RePEc:ner:irsenr:atehuxaoeu">
<file>
<!-- full text data -->
<url>
...</url>
<function>
...</function>
<format>
...</format>
<restriction>
...</restriction>
</file>
<isversionof>
<text>
<type>article</type>
<title>On Doctors, Mechanics, and Computer Specialists: The Economics of Credence Goods</title>
<abstract>Most of us need
...</abstract>
<serial>
<issue>1</issue>
<issuedate>2006</issuedate>
<volume>44</volume>
<issue>March</issue>
<journaltitle>Journal of Economic Literature</journaltitle>
</serial>
<hasauthor>
<person>
<name>Uwe Dulleck</name>
</person>
</hasauthor>
<hasauthor>
<person>
<name>Rudolf Kerschbamer</name>
</person>
</hasauthor>
</text>
</isversionof>
</text>
</amf>
Elements that are not in AMF are presumably of negligible value.
On OAI-based RePEc archive (henceforth: obra) concerns a set of data that is available to RePEc archive.
Registration of IRs uses the special address
ftp://all.repec.org/oai
There RePEc makes available a set of files. All names of files that
contain IR access data have the ending .amf.xml
. Each
file describes an obra. The name of the file start with the handle of
the obra, without the leading RePEc:
.
Each file contains one AMF collection noun describing the archive, which nests other collection nouns describing the series. To fix ideas, let the AMF data provide information about archive RePEc:ner, belonging to NEREUS.
<amf xmlns="http://amf.openlib.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://amf.openlib.org
http://amf.openlib.org/2001/amf.xsd">
<collection id="RePEc:ner">
<title>NEREUS</title>
<homepage>http://www.nereus.info<homepage>
<url>http://oai.nereus.info?verb=GetRecord&id=RePEc:abc&metaDataPrefix=amf</url>
<hasmaintainer>
<person>
<email>info@nereus.org</email>
</person>
</hasmaintainer>
<haspart>
<collection id="RePEc:ner:tilbir">
<title>Tilburg RePEc IR papers</title>
<url>http://oai.nereus.info?verb=ListRecords&metaDataPrefix=amf&set=RePEc:ner:tilbir</url>
</collection>
</haspart>
</collection>
</amf>
The URL adjective (in AMF terms) of the main collection noun must be the value of an OAI PMH URL, which, in its OAI payload, delivers the entire AMF record for the archive, including all the series.
The URL adjective in the series go to ListRecords verbs that deliver, in their OAI payloads, all records for the all ids in the series.
In this allocation of series, the series do not need to be catered for by one OAI interface. In fact, series can be contained in a variety of OAI interfaces. The key is that the collection handles have to be prefix