|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Personal data in a large digital
| library
|
|
|
|
|
|
|
| Personal data in a large digital
| library
|
| 1 May 2000
|
|
|
|
|
|
| Abstract
|
| The RePEc Economics library offers the largest distributed source of
| freely downloadable scientific research reports in the world. RePEc
| also contains details about Economics institutions, publication
| outlets and people working in the field. All this data forms a large
| relational dataset.
| In this paper we describe HoPEc, a system that allows to implement
| access control records for personal data within RePEc. The bulk of
| these data describe the authors of documents. These data are
| maintained by the authors themselves. We discuss the technical and
| social aspects of this system.
|
|
|
| This paper is
| available in PDF for A4
| paper, and for letter
| size paper.
|
|
|
| 1: Introduction
| The identification of authors is a problem that has a long and
| distinguished history in Library and Information
| Science. There are two basic problems
| with using author names to identify authors. The first is that the
| same physical person may be referred to through different names.
| The same person's name may have different
| varieties, for example
| Clinton, William Jefferson Blythe III
| Clinton, William J.
| Bill Clinton
|
| Some people may even change their name during their lifetime. For
| non-English names the translation into a suitable character sets adds
| further layer of difficulty. Thus even if the name is the same, it may
| appear that it has been transcribed to the Latin alphabet in different
| ways.
| A second problem is that two physically different people may have the same
| name, or at least, some spelling varieties of the name are the same. In the
| Economics research area, there two Michael Devereux, on is Michael
| B. Devereux, the other one is
| Michael
| P. Devereux. Any
| occurrence of the term "Michael Devereux" or "Devereux, M." would be
| ambiguous between the two persons. This is not a difficult problem if we
| are aware that there are these homonyms in the library. However, it is very
| difficult to establish if all occurrences of a fixed name string point to
| the same physical person, i.e. to show the absence of a homonym. Even if
| there is a small number of person names in the library, an appearance of a
| name string does not identify a person.
| To summarize, there are two problems to achieve a one-to-one mapping
| between name and physical person. This paper is about addressing both
| problems simultaneously within the context of a large digital library of
| academic documents. Before we discuss our work, it is useful to have a
| brief look at traditional approaches to the problem and set out our
| approach in Section 2. The collection that we are
| using is the largest distributed source of freely downloadable research
| papers in the world, the RePEc project. It indexes 80,000 documents, for
| comparison, the NCSTRL collection in computer science indexes about 30,000.
| RePEc builds a public-access documentation of Economics research. We
| describe this project in Section 3. General
| questions for the management of personal data are address in in
| Section 4. In Section 5,
| we describe the
| HoPEc
| project that implements the registration and search service. Our
| experience with running this service is the subject of Section
| 6. Section 7 concludes this
| paper.
|
| 2: Access control in a digital library
| In conventional catalogues, attention is paid to harmonising the name
| for all spelling varieties who--to the best knowledge of the
| cataloger--are the same person. This process is one of the prime
| functions of authority control.
| In the library tradition the authority control for author names is the
| process of choosing between the different variants of an author name
| that exist or can be used, a single one that will be used by the
| library. An elaborate set of rules ensures that the official name is
| the same for any library. Links from the unofficial variants to the
| official variant will be created.
| This tradition comes from running a card catalogue. Here users looking
| for "Devereux, Michael" would be advised to consult the entries for
| "Michael B. Devereux" and "Michael P. Devereux".
| Hoffman and Hatch
| (2000) is a good source of recent
| contributions to traditional authority control.
| With the advent of electronic catalogues, the approach to author
| identification has shifted away from authority control towards access
| control. With access control, there is no official version of the author
| name that all holdings in the collection would use. Instead, all variants
| of a person's name are linked to one author record. In that author record
| all relevant data of the author is collected. A recent implementation of
| the access control is
| Snyman and Jansen van Rensburg
| (1999). They propose an
| "International Standard Author Number" (ISAN) that would identify each
| author. The ISAN would be awarded and maintained by national bibliographic
| authorities. Each national authority would share the data with the others
| within an international cooperative framework.
| In computer science terms, the access control model implements a relation
| between documents and their authors. From a technical point of view, this
| is a rather trivial innovation over the authority control model. The
| problem with access control is the organisational structure that supports
| the creation and maintenance of the author database. It appears difficult
| to imagine that this problem could be completely
| solved without an international database of all living people.
| There are political, legal, and
| economic obstacles that imply that such a database is not
| forthcoming in the foreseeable future. Even a collaborative
| international database about important authors would require
| resources that are beyond of what national bibliographic
| authorities--who are considerably stretched by the appearance
| of digital resources--could afford.
| The technological change from physical libraries to digital libraries is
| the main force driving the change from authority control to access control.
| In a digital library, the implementation of access control demands a
| similar attention than the authority control in the library that consists
| of physical objects. If the digital library is large, then it is very
| costly to implement access control since this process can not be fully
| automated. We are not aware of a digital library that has complete access
| control for its author data. Thus the digital library revolution does not
| help to solve the author identification problem. If anything, the
| availability of resources on the Internet makes the author identification
| problem more pressing. It is part of the mismatch between the richness
| of resources on the Internet and the paucity of metadata about these resources.
| This mismatch is a problem that has received a lot of attention.
| While a global solution of author identification is not forthcoming, there
| is nothing to prevent a group of people from implementing a local author
| identification within a geographical or subject domain. The smaller the
| local author base, the least likely is the appearance of the name-identity
| problem. Thus resolving the problem is a less pressing issue in a local
| setting. However the collected personal information may be used to other
| ends. For example the access control records may contain contact
| information like email addresses, homepage URLs and telephone
| numbers. These additional data are typically relevant when authors and
| readers form a commonity where the same people have both author and reader
| roles and where an exchange between both groups is an ongoing concern. In
| such a setting it is useful to involve non-librarians in the access control
| process. This papers describes an attempt--the first to our
| knowledge--to involve the authors directly in the creation of access
| control data. We take a set of digital library data, and we ask authors to
| tell us which papers they have written.
| Naturally, this strategy to get the authors involved in producing
| their own access control data will only work if the authors have
| incentives to supply such data. Since academic authors are interested
| in the visibility of their work, they will have good incentives to
| supply data to a database that is frequently consulted by potential
| readers. For any database to achieve such a status, it must be
| relatively large and available at low cost. The RePEc dataset of
| Economics research is a good candidate.
|
| 3: The RePEc system
|
| 3.1: A scholarly dissemination framework
| RePEc is a system to improve scholarly commonication in Economics using the
| Internet. Scholarly commonication has two functions. It disseminates
| scholarly output and it adds a quality certificate through peer review. In
| Economics, peer-review is particularly severe. As a result, the delay of
| formal publication--the attribution of the quality
| certificate--is very long. According to
| Trivedi
| (1993), it is common that a paper takes over
| three years from submission to publication in the same academic journal,
| not counting rejections. Thus researchers can not rely on the formally
| approved work alone because this material is out of date. As a consequence
| there is a flourishing culture of informal publication. Clearly the
| exchange of such publications may take advantage from the Internet. This
| is the initial motivation of the RePEc project. It is a digital library
| that disseminates Economics research.
| A scholarly dissemination system on the Internet should start by enhancing
| the pre-Internet practice rather than attempting to replace it. The
| distribution of informal research papers in the past has been based on
| institutions issuing working papers. These are circulated through exchange
| arrangements. RePEc is a way to organise this process on the Internet.
| Its business model can be summarized as follows
| Many
| archives ===> One
| dataset ===> Many
| services
| RePEc allows Economics departments and research institutes to participate
| in a decentralized archival scheme which makes information about the
| documents that they publish accessible via the Internet. A contributor
| places metadata about its documents on a public access computer
| system. This is usually an anonymous ftp server or a web server. Each
| participating institution has total control over the contents of its
| archive. The archive management retains the liberty to post revisions or
| to withdraw a document. There is no need to transmit documents elsewhere.
| Participation does not imply that the documents are freely available. Thus
| commercial publishers can contribute. If a document is available online, a
| link may be provided to the place where the paper may be downloaded. Note
| that the document may not only be the full text of an academic paper, but
| it may also be an ancillary file, e.g. a dataset or a computer programmes.
| In April 2000, about 130 archives in 21 countries participate in
| RePEc, some of them representing several institutions. Over 80 universities
| contribute their working papers. Some important non-academic
| institutions like the US Federal Reserve bank, the IMF, World Bank and
| OECD are also present. A number of scholarly journals also contribute data
| to RePEc. There are over 80,000 resources described. About 20,000 are
| accessible on the Internet without access restrictions.
| Users access RePEc data through user services. Appendix A of
| Krichel
| (2000) lists a range of user services.
| Note that the RePEc data may not be sold or incorporated into a product
| that is sold. User services compete through quality rather than price.
| All RePEc archives benefit from simultaneous inclusion in all
| services. This leads to an efficient dissemination that a proprietary
| system can not afford.
|
| 3.2: A relational metadata set
| The contributors of bibliographical data supply it in
| a special format called ReDIF. ReDIF is a template format
| inspired by Deutsch et
| al. (1994) also known as the
| IAFA template.
| To understand the basics of ReDIF it is best to start with an
| example. Here is a--carefully selected--piece
| of ReDIF from ftp://www.econ.surrey.ac.uk/pub/RePEc/sur/surrec/surrec9601.rdf. (We suppress the abstract to conserve space.)
|
|
| Template-Type: ReDIF-Paper 1.0
| Title: Dynamic Aspect of Growth and Fiscal
| Policy
| Author-Name: Thomas Krichel
| Author-Person:
| RePEc:per:1965-06-05:thomas_krichel
| Author-Email: T.Krichel@surrey.ac.uk
| Author-Name: Paul Levine
| Author-Email: P.Levine@surrey.ac.uk
| Author-WorkPlace-Name: University of Surrey
| Classification-JEL: C61; E21; E23; E62; O41
| File-URL: ftp://www.econ.surrey.ac.uk/pub/
| RePEc/sur/surrec/surrec9601.pdf
| File-Format: application/pdf
| Creation-Date: 199603
| Revision-Date: 199711
| Handle: RePEc:sur:surrec:9601
|
|
| When we look at this record, the ReDIF data appears like a
| standard bibliographical format, with authors, title etc. The only thing
| that appears a bit mysterious here is the Author-Person field. This
| is a legal field but it is as yet very sparingly used. The field quotes
| a handle that is known to RePEc. This handle leads to a
| record at
| ftp://netec.mcc.ac.uk/pub/RePEc/per/pers/RePEc_per_1965-06-05_THOMAS_KRICHEL.rdf (We leave out a few
| fields to conserve space.)
|
| Template-Type: ReDIF-Person 1.0
| Name-Full: KRICHEL, THOMAS
| Name-First: THOMAS
| Name-Last: KRICHEL
| Postal: 1 Martyr Court
| 10 Martyr Road
| Guildford GU1 4LF
| England
| Email: t.krichel@surrey.ac.uk
| Homepage: http://openlib.org/home/krichel
| Workplace-Institution: RePEc:edi:desuruk
| Author-Paper: RePEc:sur:surrec:9801
| Author-Paper: RePEc:sur:surrec:9702
| Author-Paper: RePEc:sur:surrec:9601
| Author-Paper: RePEc:rpc:rdfdoc:concepts
| Author-Paper: RePEc:rpc:rdfdoc:ReDIF
| Handle: RePEc:per:1965-06-05:THOMAS_KRICHEL
|
|
| This record is the access control record for the author "Thomas
| Krichel". We will discuss this record in detail in the next Section. For
| now, note that in the person template, we find another RePEc identifier in
| the "Workplace-Institution" field. This points to another record at
| ftp://crefe.dse.uqam.ca/pub/RePEc/edi/inst/desuruk.rdf that
| describes the institution. The maintenance of these records is the work of
| the EDIRC project. The acronym stands for
| "Economics Departments, Institutions and Research Centers". This dataset
| has been compiled by Christian
| Zimmermann, an Associate
| Professor of Economics at Unversité du Québec à
| Montréal on his own account, as a public service to
| the Economics profession. The initial intention was to compile a directory
| with all Economics departments that have a web presence. Since there are
| many departments that have a web presence now, a large number are now
| registered, about 5,000 of them at the time of writing. All these records
| are included in RePEc.
|
| It is clear that when it comes to the collection of personal data we
| can not follow the approach of EDIRC, i.e. a single person collecting
| data "off the web".
|
| 4: Organising personal data
|
| 4.1: The relational information structure
| The RePEc model implies that the flow of information always travels bottom
| up. That means that the contributing archive is the authoritative source of
| information. User services have read-only access to the data as provided by
| the archive. Within that framework, one possible approach would be to ask
| archives to register people who work at their institution. This will make
| archive maintainers' work more onerous initially, but the overall maintenance
| effort will be smaller once all authors are registered. However, authors
| move between archives or many have work that appears in different
| archives. To date there is no satisfactory way to deal with moving authors.
| Therefore the author registration is carried out using a centralized
| system. The first step to provide personal data is to open a RePEc
| archive that houses such data.
| Before the creation of that central archive, all personal information
| within RePEc has been composed into the paper templates. This
| compositional model implies that the author data is part of the resource
| data. It does not form an independent entity. If the paper disappears, the
| author data disappears as well. If the author writes two papers, the author
| data is keyed in twice and the data attached to the author may be not be
| the same for both papers. This compositional logic comes from the
| traditional library catalogue model. We refer to this data as the composed
| personal data. What we wish to achieve is a movement from a compositional
| representation of personal data to a relational--as opposed to
| a compositional--representation. To create a fully relational
| model we need to collect some basic personal data (name, email address
| etc.) in a personal record and then we refer in the document templates to
| that personal data. This is done in the record for the paper
| RePEc:sur:surrec:9601, but to
| date this is one of very few records that contain such an entry.
| Organisational features are to blame for the lack of usage of person
| identifiers in document records. Recall that we have rejected the model
| where archive maintainers maintain personal data for "local"
| authors. Having made that choice, we rely on archive maintainers to
| convince local authors to register with the centralized personal
| registration service. It is only when this registration is achieved that
| the archive maintainer may quote the resulting handle in the paper
| templates. This change of working practice will take time to implement.
| At the time of writing, we have not even suggested to archive maintainer to
| implement this change.
| The transition from the compositional to the relational model does not rely
| on the "Author-Person" tag in the document template. Instead, we use on
| "Author-Paper" field as demonstrated in the
| RePEc:per:1965-06-05:THOMAS_KRICHEL
| record. An important advantage of this approach is that the registration
| service can ensure that are only valid document handles are used. The
| disadvantage is that the relational structure is not as complete as one
| would like it to be. The "Author-Paper" entry tells us that the
| registered person is an author of the paper, but--in the case where
| the paper has multiple authors--we do not know which of the
| co-authors of the paper is described in the person template. This is an
| important conceptual limitation of the proposed model. However it has only
| limited practical implications since a comparison of name strings should
| allow to find which of the co-authors is the one the personal record refers
| to. Heaven forbid the two Michael Devereux co-authoring a paper.
|
| 4.2: Handle structure
| The registration process associates a unique handle to each registered
| person. Conformity with RePEc tradition and its template analysis
| software requires that the first two components--delimited by
| colon--should have constant length. These first two components
| are respectively "RePEc" and "per". Thus "per" is the code for
| the RePEc archive that was opened to store the personal data collected
| by the personal data archive. The personal data is available through
| the conventional file structure set out in the Guildford
| protocol. This is the
| convention on how RePEc archives store files.
| The handles of the personal data records that are created start with
| the archive handle where the record is maintained. There are many
| options for the contents of the remainder of the identifier. In
| general, to build a unique identifier for persons is a problem that
| has never been completely solved. In our case, we feel that a
| pragmatic solution is needed that is not too mnemonic and not too
| cryptic. The combination of the name and date of
| birth--already widely used in the library world--appears
| to be a good starting point. However, it should be noted that some
| registrants may not wish their birth date to be known publicly. Thus
| requiring the date of birth would have reduced the acceptance of the
| service. What we require instead is a date in lifetime of the
| registrant that the registrant would be able to remember. We
| will refer to that date as the "significant date" of the
| registrant.
|
| 4.3: Dealing with the compositional personal data
| The registration service also attaches an internal handle to all available
| personal data that is composed into the existing resource metadata as
| collected by RePEc. These handles can not have the same structure as the
| handles for registered persons, because the significant date is not known,
| and neither is the real identity of the person. The internal handle has
| three requirements. It should be unique, it should be possible to build it
| only from the resource metadata, and finally it should be stable. These
| requirements are satisfied as long as the internal handle combines data
| from both the handle of the resource and from the names of the authors. For
| the example of the template RePEc:sur:surrec:9601 the internal
| author handle RePEc:sur:surrec:9601:Thomas_Krichel is derived.
| This method will fail to produce unique handles only if there is a document
| authored by two or more persons with the same name, which is most likely the
| result of an data input error.
| A further concern with the composed personal information is that author
| names are not normalised. To search for authors is more efficient if names
| are split into first names and last names. An important task for the code
| that implements HoPEc, is to try to normalise person names that are found
| in composite author name data. Any name field may--in contravention
| to the principles of ReDIF--contain several author names, say
| "Markus Klink and Thomas Krichel". First, potential titles like
| "Prof.", "PhD" etc. are eliminated. The resulting string is
| recursively decomposed until there are no further name separators. These
| separators are colons, semicolons, repetition of blanks and words that
| indicate a separation like "and". Each of these components is then
| examined to find out where the first name and last name are located. First,
| all first names are normalized (e.g. Klink M |--> Klink M.), then
| elements that are in brackets are removed (e.g. José Manuel Barrueco Cruz
| (editor) |--> José Manuel Barrueco Cruz). If there is a comma, it is
| assumed that the construct is lastname, firstname. Otherwise a
| structure firstname lastname is assumed. However, if the last name is
| much shorter than the first name, a structure lastname initial is
| assumed, and first and last name are assumed to be in the opposite
| order. The complete algorithm was developed under pragmatic assumptions to
| deal with the realities found in the dataset.
|
| | service
| The HoPEc service allows registrants to maintain their personal RePEc
| data. It is more fully described in
| Klink
| (1999). It has two basic functions. The
| "search" function allows to search for personal data in the RePEc
| dataset. The "registration" function allows for persons to register.
|
| 5.1: The search function
| The search function provides convenient access to personal information when
| only the name of the person is known. As a result of a search all the data
| relating to what the system presumes is the same person appears at the same
| place. This is very easy when a registered person is found. At the moment
| however, the search function mainly deals with the composed personal
| data. For these composed data, the task of the search is to find a
| take a set of names of name strings, and group those strings together that
| it believes are the same. Different aggregations of name
| strings will lead to different numbers of presumed persons.
|
| |
| |
| number
|
| |
| |
| total number of persons |
| 142.848
|
| |
| |
| registered |
| 544
|
| |
| |
| with personal data |
| 7416
|
| |
| |
| with workplace data |
| 14.567
|
| |
| |
| numbers of aggregations with common |
| |
| |
| lastname |
| 22.608
|
| |
| |
| first & last name |
| 44.292
|
| |
| |
| first & last name & archive |
| 54.092
|
| |
| |
| Within the composed personal data, there are very few person that have
| any additional data either about themselves or the institution that they
| work for. By "additional" we mean any non-blank data field that is not
| the name, for example the email address or homepage url. It is therefore
| clear that these additional data can not be used to identify people within
| the dataset, because of the limited amount of data available. In the
| best possible case, where all the persons with additional data would be
| identical, this would reduce the total number of persons only by about
| 15,000. Therefore we need to analyse the identity of persons with the
| same name structure. The following strategies can be envisaged
|
| When we look for a person's name, we are only concerned
| with the person's last name. This leads to a great
| aggregation of data.
| If differentiate further
| by using the first name--considering that
| an initial is different from the full name--a great
| aggregation of names is still possible.
| If the provision of the data through archives
| is taken into account, the
| number of aggregates does not increase by much. Thus we
| can conclude that there are only few persons who have
| publications in different archives.
|
| Because we would like to use all information that we have available, we
| decide to use the third method. That implies that we consider that a person
| is identical to another if (s)he has the some name (first and last name)
| and if (s)he publishes in the same archive. This assumption would lead to
| an error if there are two persons with the same name who publish material
| in the same archive and who are physically different. We believe that if
| such a case would occur, then the maintainer of the archive, who has
| supplied RePEc with the data, would make sure to distinguish the two
| persons through attributing different spelling forms of the name.
| The following table displays all the responses to an approximate query for
| the last name "Devereux" and first name "M".
|
| |
| last name |
| first name |
| archive
|
| |
| |
| DABROWSKI |
| M. |
| fth
|
| |
| |
| DABROWSKI |
| MAREK |
| wop
|
| |
| |
| DE BROECK |
| MARK |
| imf
|
| |
| |
| DEVEREAUX |
| MICHAEL |
| boc
|
| |
| |
| DEVEREUX |
| M. |
| fth
|
| |
| |
| DEVEREUX |
| M. |
| wop
|
| |
| |
| DEVEREUX |
| M. B. |
| fth
|
| |
| |
| DEVEREUX |
| M. P. |
| fth
|
| |
| |
| DEVEREUX |
| MICHAEL |
| ifs
|
| |
| |
| DEVEREUX |
| MICHAEL |
| nbr
|
| |
| |
| DEVEREUX |
| MICHAEL |
| wuk
|
| |
| |
| DEVEREUX |
| MICHAEL B. |
| nbr
|
| |
| |
| DEVEREUX |
| MICHAEL P. |
| ifs
|
| |
| |
| DEVEREUX |
| MICHAEL P. |
| wuk
|
| |
| |
| DEVEREUX |
| MP. |
| wop
|
| |
| |
| DVORAK |
| MICHAEL W. |
| fip
|
| |
| |
| Here the search finds some names that are similar to "Devereux". For a
| subject specialist--who knows both authors and the areas they
| usually work in--an inspection of the record for Devereaux reveals
| that this paper is in fact by Michael P. Devereux. Note that he is also
| the "Michael Devereux" who publishes in the "wuk" and "ifs" archives,
| but not the one in the "nbr" archive. For a person from outside the
| discipline it would be difficult to aggregate these authors correctly.
|
| 5.2: The registration function
| The registration process consists of two stages. In the first stage, the
| registrant is asked to supply personal information. In the second
| stage the registrant may create associations between the personal
| record and the resource records in the RePEc database. A person may
| only create associations to resources that are currently described in
| the RePEc dataset. It is not possible to create associations with
| potential future resources.
| The current implementation of HoPEc considers associations with resources
| that appear in four different templates types. These are "ReDIF-paper",
| "ReDIF-article", "ReDIF-software" and "ReDIF-series". Persons can
| have different associations with these templates. Persons can be authors of
| papers, articles or software, and they can be the editors of a series. In
| the current model, the relationship between resource type and association
| type is injective. When the resource type is known, the association type
| can be deducted, but the opposite does not hold.
| A person is identified by the combination of a name and a significant
| date. The email address of the registrant must also be known. These three
| elements are the minimum data elements that are required for
| registration. If two persons that are physically different were to enter
| the same data for all three fields--name, date and email--the
| system would consider that they are the same.
| To make the association with the resources easier, the system will suggest
| a number of resources for the registrant to associate with. These are the
| documents that a person has associated with before--if
| any--as well as any other documents with the same or a similar last
| name and the same first name or corresponding initial. The selection of
| associations uses checkboxes. It is also possible to enter the handle of
| resource to create an association with in case that this association
| is not proposed.
| Any change of the data has to be confirmed. The registrant is sent a
| four-digit confirmation number by email. To complete the registration
| process, the registrant must confirm the registration using the
| confirmation number. This avoids the administration of passwords. We do not
| use passwords to avoid the administrative burden that users who have
| forgotten their password place on the maintenance of the system. If a
| person has forgotten a password, she might wish to register anew. In that
| case we would have two records for the same person. This is a situation
| that we wanted to avoid at all cost. We think that the combination of
| names, significant date and email address offers sufficient security.
|
| 5.3: Metadata
| In order to be able to share personal data between the different services
| which make use of the RePEc dataset, the personal data must be stored in
| the form of ReDIF templates. The format of these templates has already been
| discussed above. Additionally the registration service generates data
| output in two different formats. These are a native XML format and an
| embedded RDF (see
| )Lassila and Swick
| (1999) format,
| which is currently embedded as metadata in the generated HTML files.
|
| The RDF itself does not define a metadata standard to describes
| resources. It aims are rather to allow to use a multitude of standards
| simultaneously within a uniform structure. We employ a subset of
| the metadata tags suggested by the
| Dublin Core Metadata
| Initiative (1999)
| (1999). Where necessary we add our own metadata
| tags. The abbreviated example below describes the resource at the URL
| http://netec.mcc.ac.uk/WoPEc/data/Papers/sursurrec9601.html.
|
|
| <rdf:Description
| about="http://netec.mcc.ac.uk/WoPEc/data/
| Papers/sursurrec9601.html" bagID="bag_0">
| <DC:Title="Dynamic Aspect of Growth and
| Fiscal Policy">
| <DC:Creator> <rdf:Bag
| rdf:_0="LEVINE, PAUL"
| rdf:_1 resource="./gemini.cgi?submit=id&
| HANDLE=RePEc:per:1965-06-05:THOMAS_KRICHEL"/>
| </DC:Creator></rdf:Description>
| <rdf:Description aboutEach="#bag_0"
| HOPEC:attributedto="Hopec Person
| Registration Project">
| <rdf:type resource="http://www.w3.org/1999/
| 02/22-rdf-syntax-ns#Statement" />
|
|
| At the moment only title and author information about the paper are
| given. The paper has one title and two authors. Author information is given
| in the BAG construct. One author is unregistered (the literal "Levine,
| Paul") and one author is the registered author Thomas Krichel. The latter
| is not represented by a string literal, but by his own resource.
| The following description of the bag serves an interesting purpose. We
| refine the statement of the authorship of this paper by giving
| responsibility information about the first statement (the author bag). In
| words: The HoPEc Registration Project states that Paul Levine and Thomas
| Krichel are the authors of the above mentioned resource. This process is
| known as reification.
|
| 6: Preliminary results on the usage of HoPEc
| The HoPEc service opened for public registration in October 1999. Therefore
| at the time of writing, we can look over a seven-month period of work for
| the service. Clearly this service is only valuable if it is maintained
| indefinitely. Our main concern here is therefore to look at issues that
| affect the sustainability of the service.
| There are several significant problems that a service like HoPEc
| faces. First since there is no historical precedent for such a service, it
| is not easy to commonicate the raison d'être of the service to a
| potential registrant. Some people think that they need to register in
| order to use RePEc services. While this delivers valuable information about
| who is interested in using RePEc services--or more precisely who is
| too dumb to grasp that these services do not require
| registration--it clutters the database with records of limited
| usefulness. Here is a rather striking example of a record that
| has been removed
|
|
| Template-Type: ReDIF-Person 1.0
| Name-Full: MARLEY, BOB
| Name-First: BOB
| Name-Last: MARLEY
| Email: johniblaze@excite.com
| Workplace-Name: mcdonalds
| Workplace-Postal: 101 webb blvd
| new bern nc 28532
| Workplace-Email: hotboy@hotmail.com
| Workplace-Fax: 3344447554
| Handle: RePEc:per:1980-10-16:BOB_MARLEY
|
|
| We have taken steps to warn registrants, that if they
| do not belong to our target group, registration is
| a waste of their time. In particular, we tell them
| that registration does not give them any better access
| to RePEc services.
| Having taken these steps, the problem of spurious registration does not
| appear to be important. From manual inspection, we find that in about one
| in eight of all registered persons, there is no evidence that the
| registered person belongs to the academic Economics research commonity. We
| think that probably less than ten percent of the records will turn out to
| be spurious. In the future we will look at records that have not been
| updated for three years, and remove them if they do not contain any links
| to documents. The person concerned is free to register again.
| In Figure 1,
| we list the registrants by year of the significant date. It
| appears that there are some registrants who choose the date of registration
| as the significant date. While we have not prohibited this, there is a
| concern that some registrants may forget their significant date. For basic
| security reasons, the significant date is not directly visible on the
| search pages. If registrants wish to update their records and do not know the
| significant date, they will take either one of two actions.
| They will mail the HoPEc helpline to ask what the
| date is.
| They will register again and create a duplicate
| record.
|
| Both of these will create some manual maintenance work.
| Therefore we are pleased to see that the large majority
| of registrants have chosen a date that looks to us like
| it is their birth date. If it is, then the median
| RePEc registrant is a junior researcher in her thirties. This
| confirms data that we have gathered informally from other
| sources.
|
|
| 7: Conclusions
| The Internet empowers those people who have access to it to create
| information architecture that are completely new. HoPEc presents such a
| radical innovation. Our demonstrated success in running HoPEc is ground for
| optimism that we will be able to build a sophisticated relational academic
| documentation that will be open for public access on the Internet.
| To date, we have about 1,800 resources that have at least one registered
| author. RePEc user services use these data to group papers with the same
| author together. Other information service will greatly benefit
| from reliable author identification data. For example,
| a citation analysis service is planned that will gather
| citation data for registered authors.
|
| The work
| discussed here has received financial support by the Joint
| Information Systems Committee of the UK Higher Education Funding
| Councils through its
| Electronic Library Programme.
| Markus J.R. Klink is especially grateful for the support and helpful
| comments of his supervisor Stefan Kuhlins at Mannheim University.
|
|
|
|
|
|
|