Gentilly paper
- Introduction
-
Thomas Krichel
started work on this paper in Gentilly, France, 2004-06-01,
This is the version of 2004-06-20.
-
This paper sets out a reform of the input operation of the
NEP: New Economics Papers
service. The input operation
is the process that starts with the gathering of new papers in
a nep-all issue and finishes with the moment a subject issue is
distributed to report subscribers. Accordingly,
this document does not cover
the process of adding new reports, changing editors
and other operations on reports, rather than report issues.
-
This paper assumes that the reader is familiar with the
way NEP operates at the time of writing. It includes
some technical implementation details. Such details
are noted in italics and may be ignored by
readers without technical training.
- Motivation
-
A reform the input system is required to improve
the observability of NEP. In the current system, actions of
editors are observed when the mail is sent out. We parse mail
logs of actual reports sent. This is a fragile system, because
mail user agents and mailers tend to do all
sorts of modifications to the report issue in the process
of its creation and transport. Examples
include the deployment of quoted-printable encoding,
attachment of HTML code, and changes in the actual
or reported character encoding.
As a result of
these changes, the parsing of reports is difficult.
In fact all major informational components of report
issues, i.e., the issue date, the mailing date and the
handles are difficult to find.
-
A reform of the input system is also necessary to improve
the quality of the nep-all data. Particular, too
many nep-all reports contain dead links to papers. Some
contain HTML entities. Other purported
links don't go to the full text itself, but to an
intermediate page from which the full-text can be downloaded.
Worse than these problems, some issues in the past have
contained papers that were already in an earlier nep-all
issue and this has caused considerable confusion for
the calculation of performance indicators for the NEP
service.
-
A reform of the input system may also help to reduce
the fluctuation in the size of nep-all. These fluctuations
imply that papers that appear in a large nep-all issue
have less chances to appear in subject reports than papers
that appear in small nep-all issues.
-
A reform of the input system is also desirable in order
to make changes to the formatting of the NEP reports easier
to implement. For this aim, it appears convenient to
separate contents and formatting of reports. The contents
of reports will be presented in the
Academic
Metadata Format (AMF), and the presentation will
be encoded in XSLT.
-
A reform will allow to implement pre-sorting
of papers. See the section on pre-sorting.
- General procedures for subject editors
- The current operation, by which a nep-all
issues are prepared by the general editor (henceforth: GE),
and subject editors prepare a report
on each nep-all issue, is continued.
-
Editors do not edit reports using their
own email system. Instead, they use a web interface
to prepare the report issue and request it to be
mailed out.
-
Access to the report generating web site (henceforth: regwes)
is authenticated with a user name and password system.
The user name of an
editor is the name of the report she edits, e.g. "nep-xxx".
Subject editors contact the GE for the setting of a
password.
The passwords that editors use is stored on the machine
hosting the NEP service in an non encrypted plain text file.
Regwes does use the https protocol. Thus, the security
level is relatively low. Editors should not use passwords
that are valuable. Editors can not change passwords, only
the general editor can.
-
There is no support for multiple editors of the
same report. Editors are free to let trusted
individuals have their passport to do the editing
for them, only one single person per report will
be identified with its contents at any one time.
-
When the issue arrives on the mailing list, it will appear
to come from an address with
the name of the editor. The reply-to: header will go
to the editor. The From: header will have the name of
the editor followed by the address of the sending account.
The sending account is configurable in the implementation
software.
- Every report issue has the subject line
"NEP report on XXX, YYYY-MM-DD, (NN papers)",
where XXX is the
subject of the report, YYYY, MM, and DD are
the year, month and day, observed, in UTC, at the
time when the corresponding nep-all report has been
created, and NN is the number of papers in the
report. It will not reflect the time of the mailing
out of the issue.
-
When an editors logs in to regwes, she will find
a list of nep-all issues that she has not been dealing
with yet. A report appears as a textual link with
the anchor "YYYY-MM-DD" of the left and optionally
on the right as well.
This initial list of nep-all issues is called the
"all-view". From the all view, the editor
chooses an issue to edit.
-
A nep-all issues may be pre-sorted for the editor, see
the section on pre-sorting. If it has
been pre-sorted, a link to the pre-sorted version
appears to the right next to the non-pre-sorted
version on the left.
-
When an editor chooses to edit a
report issue, she is presented with a HTML view
of the issue in her
browser. This lists all the
information about the papers in the nep-all
issue in one large HTML document. Each paper
has a check box for selecting or deselecting the
paper. Initially, none of the boxes is checked.
The editor includes a paper in the report by
checking its box. On the bottom of the list,
there are three buttons: "select all", "clear
all" and "move to sorting". "select all" will
select all papers, "clear all" will deselect all
papers, and "finish" will move the the
editor to the sorting screen.
-
When the editor has moved to sorting,
the editor sees only the list of selected
papers. The selection check boxes are
gone, instead the editor sees PLUS, a text
box and a MINUS in next to each paper.
The PLUS and MINUS are hyperlinks. If they
are click, the paper moves one up or
one down. At the bottom, there is a button
"use numbers". If that button is clicked,
the software uses positive
integer numbers the editor has
entered in the boxes. If a
box does not contain a positive
integer, its contents is
ignored. Otherwise
the software proceeds from the bottom, and
moves any paper with a number to the place suggested
by that number, and then proceeds upwards
to the next paper as found when the process
started. This process can be repeated.
When the editor has finished sorting she
presses a "view text" button.
-
If the editor has pressed "view text" she will see a
HTML page that looks like fixed width font email.
At the same time a test
report is sent editor's address.
Editors are encouraged
to send a test issue first.
-
When an issue is ready for sending, a text view
of the report issue is generated.
This text
view is plain text in UTF-8 encoding. A
HTML view is also computed. Both are
packaged as multipart/alternatives and
mailed together to subscribers.
-
Once the issue has been sent out, the
editor is returned to the all-view screen.
The corresponding
nep-all report has disappeared from the
screen.
- General procedures for the GE
-
The GE, or, more generally, all the
people who have shell access to the machine
where NEP runs can change any aspect of its behavior
by manipulating simple text files.
-
The GE uses shell access
to maintain the passwords
file with the passwords of editors.
-
The GE uses shell access to launch a script that
composes the nep-all report. This report is composed
as an AMF file. When the script that composes
the new nep-all report is finished,
the report can then be manually
edited by the GE to remove papers that appear
to be old. When finished, the GE examines if the
file is still well-formed XML. Finally, the
GE launches the pre-sorting script.
-
After the nep-all issue file
is created, the GE uses regwes to
compose the nep-all report. The subject editors
are informed about the new nep-all issue by
virtue of being members of nep-all. Details
on how they get there are to be found in the
NEP technical guide.
- Pre-sorting
-
In order to make life for the subject editors easier,
reports may be pre-sorted. Pre-sorting is a process
by which a computer program makes a guess on
which documents are most likely to be included
in the subject report issue.
-
For the nep-all issue, pre-sorting writes a
random order of the papers.
-
At the time of writing, no subject
pre-sorting procedure
has been defined. But the current setup has to
take account of pre-sorting because it is likely
to take some time. It will take time to sort
the nep-all issue for a report, and it will take
take a lot of time to sort for all subjects.
-
When pre-sorting for a subject is finished, the
report editor is informed with an email that
a new report issue has been pre-sorted. While
pre-sorting is is progress, only
the non-pre-sorted version of the
report appear in the all-view screen.