Description: CMU logo

Information Retrieval
11-741

 

Description:

This course studies the theory, design, and implementation of text-based information systems. The Information Retrieval core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, collaborative filtering, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems.

Prerequisites:

  • Programming, data structures, and computer systems courses comparable to 15-211 and 15-213.
  • Algorithms comparable to the undergraduate CS algorithms course (15-451) or higher.
  • Basic linear algebra (21-241 or 21-341).
  • Basic statistics (36-202) or higher.

Time & Location:

TR 12:00-1:20pm, 4307 Gates Hillman Complex

Instructors:

Jamie Callan and Yiming Yang

Instructor Office Hours:

By appointment

Teaching Assistant(s):

Siping Ji and Napat Luevisadpaibul

TA Office Hours:

By appointment

Discussion Forum:

See the 11-741 discussion at piazza.com. You must be invited to join this forum. We will invite everyone enrolled in the class on Friday, January 17.

Textbook:

The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008.

Other Readings:

Selected papers or book chapters will be assigned reading for some lectures. All will be available online and/or on reserve in the Engineering and Science Library, 4th floor, Wean Hall.

Course Notes:

Usually available online, occasionally distributed in lectures. Online access is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using VPN.

Homework:

1 brief reading summary per week (1/2 - 1 page), and 6 problem sets or programming assignments. This is subject to change. Please see the submission guidelines.

Grading:

Grades are based on the reading summaries (10%, total), homework assignments (10%, 10%, 10%, 10%, 10%, 10%), midterm exam (15%) and final exam (15%).

Course Policies:

Attendance, Cheating, Laptop computers, Late homework, Recording & videotaping

Sitting In:

Approval from the instructors is required.

Syllabus:

The anticipated syllabus is below. It is subject to change.
 

Lecture

Day

Important
Events

Topic

Readings

1.

1/14

 

Course overview

 

2.

1/16

 

Introduction to ad-hoc search: Boolean retrieval

Ch 1

3.

1/21

HW1 out

Text representation

Ch 4 (out of sequence)

4.

1/23

 

Search engine indexes

Ch 2.0 - 2.2.1, 2.4

5.

1/28

 

Evaluation

Ch 8

6.

1/30

 

Information needs and queries

Ch 19.4

7.

2/4

HW1 due

Retrieval models: Vector space

Ch 6, 7

8.

2/6

HW2 out

Retrieval models: Probabilistic models

Ch 11

9.

2/11

 

Retrieval models: Statistical language models

Ch 12, Zhai & Lafferty

10.

2/13

 

Retrieval models: Inference networks

Ch 10; Metzler

11.

2/18

 

Index construction

Ch 2.2.2 - 2.2.4 (out of sequence)

12.

2/20

 

Index construction

Ch 3.2, 5.1, 5.3, McCreadie

13.

2/25

 

Federated and vertical search

Si and Callan, Arguello, et al

14.

2/27

HW2 due, HW3 out

Relevance feedback and Search log analysis

 

15.

3/4

 

Search log analysis

Agichtein

 

3/6

 

Midterm Exam

Sample exams: 2012, 2013

 

3/11

 

Spring Break!

 

 

3/13

 

Spring Break!

 

16.

3/18

 

Document clustering

Ch 16

17.

3/20

HW3 due

Document clustering

Ch 17

18.

3/25

HW4 out

Collaborative filtering

Su Khoshgoftaar, AAI 2009

19.

3/27

 

Collaborative filtering

 

20.

4/1

 

Categorization (overview)

Ch 13

21.

4/3

 

Categorization (logistic regression algorithms, convexity, regularization)

22.

4/8

HW4 due, HW5 out

Categorization (SVM)

Ch 15

 

4/10

 

Mid Semester Break

 

23.

4/15

 

Learning to rank (pairwise constrained optimization)

Joachims KDD'02

24.

4/17

 

Learning to rank (list-wise constrained optimization)

Yue SIGIR'07

25.

4/22

 HW5 due, HW6 out

Link Analysis: HITS, PageRank

Ch 21

26.

4/24

 

Link Analysis: Personalized and Topic Sensitive PageRank

Haveliwala WWW'02

27.

4/29

 

Principle Component Analysis and Singular Value Decomposition

Deerwester JASIS 1990

28.

5/1

 HW6 due

Large-scale structured learning for classification

Gopal and Yang, KDD'03

 

5/9

 8:30am -11:30am

Final Exam

Room MM A14

Sample Exam: 2012


Updated on January 17, 2014.
Jamie Callan, Yiming Yang