Gentilly paper

  1. Introduction
    1. Thomas Krichel started work on this paper in Gentilly, France, 2004-06-01, This is the version of 2004-06-20.
    2. This paper sets out a reform of the input operation of the NEP: New Economics Papers service. The input operation is the process that starts with the gathering of new papers in a nep-all issue and finishes with the moment a subject issue is distributed to report subscribers. Accordingly, this document does not cover the process of adding new reports, changing editors and other operations on reports, rather than report issues.
    3. This paper assumes that the reader is familiar with the way NEP operates at the time of writing. It includes some technical implementation details. Such details are noted in italics and may be ignored by readers without technical training.
  2. Motivation
    1. A reform the input system is required to improve the observability of NEP. In the current system, actions of editors are observed when the mail is sent out. We parse mail logs of actual reports sent. This is a fragile system, because mail user agents and mailers tend to do all sorts of modifications to the report issue in the process of its creation and transport. Examples include the deployment of quoted-printable encoding, attachment of HTML code, and changes in the actual or reported character encoding. As a result of these changes, the parsing of reports is difficult. In fact all major informational components of report issues, i.e., the issue date, the mailing date and the handles are difficult to find.
    2. A reform of the input system is also necessary to improve the quality of the nep-all data. Particular, too many nep-all reports contain dead links to papers. Some contain HTML entities. Other purported links don't go to the full text itself, but to an intermediate page from which the full-text can be downloaded. Worse than these problems, some issues in the past have contained papers that were already in an earlier nep-all issue and this has caused considerable confusion for the calculation of performance indicators for the NEP service.
    3. A reform of the input system may also help to reduce the fluctuation in the size of nep-all. These fluctuations imply that papers that appear in a large nep-all issue have less chances to appear in subject reports than papers that appear in small nep-all issues.
    4. A reform of the input system is also desirable in order to make changes to the formatting of the NEP reports easier to implement. For this aim, it appears convenient to separate contents and formatting of reports. The contents of reports will be presented in the Academic Metadata Format (AMF), and the presentation will be encoded in XSLT.
    5. A reform will allow to implement pre-sorting of papers. See the section on pre-sorting.
  3. General procedures for subject editors
    1. The current operation, by which a nep-all issues are prepared by the general editor (henceforth: GE), and subject editors prepare a report on each nep-all issue, is continued.
    2. Editors do not edit reports using their own email system. Instead, they use a web interface to prepare the report issue and request it to be mailed out.
    3. Access to the report generating web site (henceforth: regwes) is authenticated with a user name and password system. The user name of an editor is the name of the report she edits, e.g. "nep-xxx". Subject editors contact the GE for the setting of a password. The passwords that editors use is stored on the machine hosting the NEP service in an non encrypted plain text file. Regwes does use the https protocol. Thus, the security level is relatively low. Editors should not use passwords that are valuable. Editors can not change passwords, only the general editor can.
    4. There is no support for multiple editors of the same report. Editors are free to let trusted individuals have their passport to do the editing for them, only one single person per report will be identified with its contents at any one time.
    5. When the issue arrives on the mailing list, it will appear to come from an address with the name of the editor. The reply-to: header will go to the editor. The From: header will have the name of the editor followed by the address of the sending account. The sending account is configurable in the implementation software.
    6. Every report issue has the subject line "NEP report on XXX, YYYY-MM-DD, (NN papers)", where XXX is the subject of the report, YYYY, MM, and DD are the year, month and day, observed, in UTC, at the time when the corresponding nep-all report has been created, and NN is the number of papers in the report. It will not reflect the time of the mailing out of the issue.
    7. When an editors logs in to regwes, she will find a list of nep-all issues that she has not been dealing with yet. A report appears as a textual link with the anchor "YYYY-MM-DD" of the left and optionally on the right as well. This initial list of nep-all issues is called the "all-view". From the all view, the editor chooses an issue to edit.
    8. A nep-all issues may be pre-sorted for the editor, see the section on pre-sorting. If it has been pre-sorted, a link to the pre-sorted version appears to the right next to the non-pre-sorted version on the left.
    9. When an editor chooses to edit a report issue, she is presented with a HTML view of the issue in her browser. This lists all the information about the papers in the nep-all issue in one large HTML document. Each paper has a check box for selecting or deselecting the paper. Initially, none of the boxes is checked. The editor includes a paper in the report by checking its box. On the bottom of the list, there are three buttons: "select all", "clear all" and "move to sorting". "select all" will select all papers, "clear all" will deselect all papers, and "finish" will move the the editor to the sorting screen.
    10. When the editor has moved to sorting, the editor sees only the list of selected papers. The selection check boxes are gone, instead the editor sees PLUS, a text box and a MINUS in next to each paper. The PLUS and MINUS are hyperlinks. If they are click, the paper moves one up or one down. At the bottom, there is a button "use numbers". If that button is clicked, the software uses positive integer numbers the editor has entered in the boxes. If a box does not contain a positive integer, its contents is ignored. Otherwise the software proceeds from the bottom, and moves any paper with a number to the place suggested by that number, and then proceeds upwards to the next paper as found when the process started. This process can be repeated. When the editor has finished sorting she presses a "view text" button.
    11. If the editor has pressed "view text" she will see a HTML page that looks like fixed width font email. At the same time a test report is sent editor's address. Editors are encouraged to send a test issue first.
    12. When an issue is ready for sending, a text view of the report issue is generated. This text view is plain text in UTF-8 encoding. A HTML view is also computed. Both are packaged as multipart/alternatives and mailed together to subscribers.
    13. Once the issue has been sent out, the editor is returned to the all-view screen. The corresponding nep-all report has disappeared from the screen.
  4. General procedures for the GE
    1. The GE, or, more generally, all the people who have shell access to the machine where NEP runs can change any aspect of its behavior by manipulating simple text files.
    2. The GE uses shell access to maintain the passwords file with the passwords of editors.
    3. The GE uses shell access to launch a script that composes the nep-all report. This report is composed as an AMF file. When the script that composes the new nep-all report is finished, the report can then be manually edited by the GE to remove papers that appear to be old. When finished, the GE examines if the file is still well-formed XML. Finally, the GE launches the pre-sorting script.
    4. After the nep-all issue file is created, the GE uses regwes to compose the nep-all report. The subject editors are informed about the new nep-all issue by virtue of being members of nep-all. Details on how they get there are to be found in the NEP technical guide.
  5. Pre-sorting
    1. In order to make life for the subject editors easier, reports may be pre-sorted. Pre-sorting is a process by which a computer program makes a guess on which documents are most likely to be included in the subject report issue.
    2. For the nep-all issue, pre-sorting writes a random order of the papers.
    3. At the time of writing, no subject pre-sorting procedure has been defined. But the current setup has to take account of pre-sorting because it is likely to take some time. It will take time to sort the nep-all issue for a report, and it will take take a lot of time to sort for all subjects.
    4. When pre-sorting for a subject is finished, the report editor is informed with an email that a new report issue has been pre-sorted. While pre-sorting is is progress, only the non-pre-sorted version of the report appear in the all-view screen.