CMU logo

11-741: Information Retrieval

LTI logo

 

Description:

This course studies the theory, design, and implementation of text-based information systems. The Information Retrieval core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, collaborative filtering, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems.

Prerequisites:

  • Programming and data-structures at the level of 15-211 or higher.
  • Algorithms comparable to the undergraduate CS algorithms course (15-451) or higher.
  • Basic linear algebra (21-241 or 21-341).
  • Basic statistics (36-202) or higher.

Time & Location:

TR 12:00-1:20pm, 4215 Gates Hillman Complex

Instructors:

Jamie Callan and Yiming Yang

Instructor Office Hours:

By appointment

Teaching Assistant(s):

Anagha Kulkarni and Konstantin Salomatin

TA Office Hours:

By appointment

Textbook:

The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008.

Other Readings:

Selected papers or book chapters will be assigned reading for some lectures. All will be available online and/or on reserve in the Engineering and Science Library, 4th floor, Wean Hall.

Course Notes:

Usually available online, occasionally distributed in lectures. Online access is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using VPN.

Homework:

1 brief reading summary per week (1/2 - 1 page), and 6 problem sets or programming assignments. This is subject to change. Please see the submission guidelines.

Grading:

Grades are based on the reading summaries (10%, total), homework assignments (5%, 10%, 10%, 7%, 13%, 5%), midterm exam (20%) and final exam (20%).

Course Policies:

Attendance, Cheating, Laptop computers, Late homework, Recording & videotaping

Sitting In:

Approval from the instructors is required.

Syllabus:

The anticipated syllabus is below. It is subject to change.
 

Lecture

Day

Important
Events

Topic

Readings

1.

1/17

 

Course overview (slides)

 

2.

1/19

HW0 out

Introduction to ad-hoc search: Boolean retrieval (slides)

Ch 1

3.

1/24

 

Index construction (slides)

Ch 2.0 - 2.2.1, 2.4

4.

1/26

HW1 out

Index construction (slides)

Ch 4

5.

1/31

Hadoop Recitation and Examples

Index construction (slides)

Ch 3.2, 5.1, 5.3, McCreadie

6.

2/2

 

Text representation (slides)

Ch 2.2.2 - 2.2.4

7.

2/7

HW1 due

Information needs and queries (original slides & presented slides)

Ch 19.4

8.

2/9

HW2 out

Evaluation

Ch 8

9.

2/14

 

Evaluation (slides)

 

10.

2/16

 

Retrieval models: Vector space (original slides & presented slides)

Ch 6, 7

11.

2/21

 

Retrieval models: Probabilistic models (slides)

Ch 11

12.

2/23

 

Retrieval models: Statistical language models (slides)

Ch 12, Zhai & Lafferty

13.

2/28

HW2 due &
HW3 out

Retrieval models: Statistical language models (slides)

Ch 10; Metzler

14.

3/1

 

Retrieval models: Structured documents, inference network (slides)

Agichtein

15.

3/6

 

Search log analysis (slides)

 

 

3/8

 

Midterm Exam (answers)

Sample exams: 2009, 2008, 2007

 

3/13

 

Spring Break!

 

 

3/15

 

Spring Break!

 

16.

3/20

HW3 due

Learning to rank (slides)

Joachims KDD'02

17.

3/22

 

Learning to rank

Yue SIGIR'07

18.

3/27

 

Text Categorization (slides)

Ch 13

19.

3/29

HW4 out

Text categorization (slides, slides)

Ch 15

20.

4/3

 

Retrieval models: HITS, PageRank (slides)

Ch 21

21.

4/5

 

Retrieval models: Personalized and topic sensitive PageRank

Haveliwala WWW'02

22.

4/10

HW4 due, HW5 out

 

Query classification, federated search (slides)

Callan, Si and Callan, Arguello, et al

23.

4/12

 

Query classification, federated search

 

24.

4/17

 

Significance Tests (slides)

 Yang & Liu SIGIR'99

 

4/19

 

Mid Semester Break

 

25.

4/24

HW5 due, HW6 out

 

Document clustering (slides)

Ch 16

26.

4/26

 

Document clustering (slides)

Ch 17

27.

5/1

HW6 due

Collaborative filtering (slides)

Shardanand, CHI'95; Si & Jin, ICML'03

28.

5/3

 

Collaborative filtering

 

 

5/14

5:30-8:30pm (MM 103)

Final Exam

Sample Exam: 2009

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Updated on January 9, 2012.

Jamie Callan, Yiming Yang