Online Information Retrieval Techniques

last revised: 2011–09–21

This document is available in PDF format for US letter size paper and for A4 size paper. Do not print out this web page, you will get an incomplete document.

Course Description

This course will introduce the students to the theory of information retrieval and its application in large-scale commercial database systems and on the web.

Course objectives

On completing this course, students

The /home/krichel/liu_admin/pslo.html aimed at by this course is 3E, "Students will demonstrate appropriate techniques for identification, selection, acquisition, retrieval, evaluation and synthesis of information from a variety of information systems and services."


Students should have a basic command of the Microsoft Windows operating system because the machines in the lab run on this operating system.


Thomas Krichel
Palmer School of Library and Information Science
C.W. Post Campus of Long Island University
720 Northern Boulevard
Brookville, NY 11548–1300
work phone: +1–(516)299–2843

Class structure

Classes will be held on the CW Post campus of LIU, between 19:00 and 21:00. After class students can stay on for guided practice. There will be a mixture of lectures and hands-on work in the lab. Provisional class details are:

1 2011–09–12 introduction to the course
2 2011–09–19 history of information retrieval
3 2011–09–26 preprocessing of records and queries
4 2011–10–03 the Boolean model of retrieval
5 2011–10–17 introduction to search and to Dialog
6 2011–10–24 the Dialog command language
7 2011–10–31 Dialog by example
8 2011–11–07 vector model and ProQuest
9 2011–11–14 Credo and Gale
10 2011–11–21 web information retrieval
11 2011–11–28 Google and Bing
12 2011–12–05 Google Scholar, Web of Knowlegde, IR performance measures
13 2011–12–12 constructing a search interface

To print the slides in Microsoft powerpoint, press control-p to print, then under "Print what" choose "Handouts", and under "Color/grayscale" choose "Pure Black and White". You can also use openoffice to print the slides.

Class mailing list

A mailing list for this class has been set up. Students who wish to stay informed are encouraged to sign up.


The powerpoint slides of the instructor are the reading prime reading material. The slides may point to other sources of reference as required.

Readings that the slides are derived from include van Rijsbergen (1979), Baeza-Yates and Ribeiro-Neto (1999), Manning and Raghavan and Schütze (2009) and maybe even Hock (2010).

Background readings on history include Lesk (1995), Schatz (1997), Salton (1987).

Historically significant contributions Luhn (1957), Spärk-Jones (1972), and Salton and Wong and Yang (1975). Bourne (1963) provides an overview over technology in the early 60s.

On Dialog, we use the Dialog Command Language Pocket Guide.

On search interface we will use Hearst (2009).

Some reference questions to work on are available.


Each student will have to prepare a search exercise and report as detailed in the first lecture. It will count for 50% of the total grade. It is due on December 12. The report must not exceed five pages. Appendices are permitted, but may not get read. The remaining 50% will come from quizzes held at the start of each lecture except the first. Quizzes will last around ten minutes. The questions aim for a precise, short answer. The worst five quiz performances will be discarded when the average is being computed.

Contact hours breakdown

This is a breakdown of additional work that has to be done outside class by week. As per New York state regulations, the course needs to contain 120 additional hours of work outside class. This is an estimated breakdown of these times by week, as ordered by the Palmer School director.

Valid XHTML 1.0!