Description: CMU logo

Information Retrieval
11-741

 

Description:

This course studies the theory, design, and implementation of text-based information systems. The Information Retrieval core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, collaborative filtering, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems.

Prerequisites:

  • Programming, data structures, and computer systems courses comparable to 15-211 and 15-213.
  • Algorithms comparable to the undergraduate CS algorithms course (15-451) or higher.
  • Basic linear algebra (21-241 or 21-341).
  • Basic statistics (36-202) or higher.

Time & Location:

TR 12:00-1:20pm, 4307 Gates Hillman Complex

Instructors:

Jamie Callan and Yiming Yang

Instructor Office Hours:

By appointment

Teaching Assistant(s):

Juan Manuel Caicedo Carvajal and Reyyan Yeniterzi

Mailing List:

11741-spring13@lists.andrew.cmu.edu. Please send your questions to this address, so that other students can receive the answers.
List web page

TA Office Hours:

By appointment

Textbook:

The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008.

Other Readings:

Selected papers or book chapters will be assigned reading for some lectures. All will be available online and/or on reserve in the Engineering and Science Library, 4th floor, Wean Hall.

Course Notes:

Usually available online, occasionally distributed in lectures. Online access is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using VPN.

Homework:

1 brief reading summary per week (1/2 - 1 page), and 6 problem sets or programming assignments. This is subject to change. Please see the submission guidelines.

Grading:

Grades are based on the reading summaries (10%, total), homework assignments (10%, 10%, 10%, 10%, 10%), midterm exam (20%) and final exam (20%).

Course Policies:

Attendance, Cheating, Laptop computers, Late homework, Recording & videotaping

Sitting In:

Approval from the instructors is required.

Syllabus:

The anticipated syllabus is below. It is subject to change.
 

Lecture

Day

Important
Events

Topic

Readings

1.

1/15

 

Course overview (slides)

 

2.

1/17

Introduction to ad-hoc search: Boolean retrieval (slides)

Ch 1

3.

1/22

 

Index construction (slides)

Ch 2.0 - 2.2.1, 2.4

4.

1/24

 

Index construction (slides)

Ch 4

5.

1/29

 

Index construction (slides)

Ch 3.2, 5.1, 5.3, McCreadie

6.

1/31

 

Text representation (slides)

Ch 2.2.2 - 2.2.4

7.

2/5

HW1 out

Information needs and queries (slides)

Ch 19.4

8.

2/7

 

Evaluation (slides)

Ch 8

9.

2/12

 

Evaluation (slides)

 

10.

2/14

 

Retrieval models: Vector space (slides)

Ch 6, 7

11.

2/19

 

Retrieval models: Probabilistic models (slides)

Ch 11

12.

2/21

 

Retrieval models: Statistical language models (slides)

Ch 12, Zhai & Lafferty

13.

2/26

HW1 due

Retrieval models: Inference networks (slides)

Ch 10; Metzler

14.

2/28

 

Search log analysis (slides)

 

15.

3/5

HW2 out

Search log analysis (slides)

Agichtein

 

3/7

 

Midterm Exam (answers)

Sample exams: 2010, 2012

 

3/12

 

Spring Break!

 

 

3/14

 

Spring Break!

 

16.

3/19

 

Document clustering (slides)

Ch 16

17.

3/21

HW2 due

Document clustering (slides) (recitation)

Ch 17

18.

3/26

HW3 out

Collaborative filtering (slides)

Shardanand, CHI'95; Si & Jin, ICML'03

19.

3/28

 

Collaborative filtering (slides) (note)

 

20.

4/2

 

Text Categorization (slides) (slides)

Ch 13

21.

4/4

HW3 due, HW4 out

Text categorization (slides)

Ch 15

22.

4/9

 

Learning to rank (slides)

Joachims KDD'02

23.

4/11

Learning to rank (slides)

Yue SIGIR'07

24.

4/16

 

Link Analysis: HITS, PageRank (slides)

Ch 21

 

4/18

HW4 due, HW5 out

Mid Semester Break

 

25.

4/23

 

Link Analysis: Personalized and Topic Sensitive PageRank (slides)

Haveliwala WWW'02

26.

4/25

 

Significance Tests (slides)

Yang & Liu SIGIR'99

27.

4/30

 

Federated and vertical search (slides)

Callan, Si and Callan

28.

5/2

 HW5 due

Federated and vertical search (slides I, slides II)

Arguello, et al

 

5/13

5:30 - 8:30 pm

Final Exam, Room: SH 125

Sample Exam: 2009


Updated on January 10, 2013.
Jamie Callan, Yiming Yang