Description: CMU logo

Information Retrieval
11-741

 

Description:

This course studies the theory, design, and implementation of text-based information systems. The Information Retrieval core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling, link analysis), clustering algorithms, collaborative filtering, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems.

Prerequisites:

  • Programming, data structures, and computer systems courses comparable to 15-211 and 15-213.
  • Algorithms comparable to the undergraduate CS algorithms course (15-451) or higher.
  • Basic linear algebra (21-241 or 21-341).
  • Basic statistics (36-202) or higher.

Time & Location:

TR 12:00-1:20pm, 4307 Gates Hillman Complex

Instructors:

Jamie Callan and Yiming Yang

Instructor Office Hours:

By appointment

Teaching Assistant(s):

Siping Ji and Napat Luevisadpaibul

TA Office Hours:

By appointment

Discussion Forum:

See the 11-741 discussion at piazza.com. You must be invited to join this forum. We will invite everyone enrolled in the class on Friday, January 17.

Textbook:

The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008.

Other Readings:

Selected papers or book chapters will be assigned reading for some lectures. All will be available online and/or on reserve in the Engineering and Science Library, 4th floor, Wean Hall.

Course Notes:

Usually available online, occasionally distributed in lectures. Online access is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using VPN.

Homework:

1 brief reading summary per week (1/2 - 1 page), and 6 problem sets or programming assignments. This is subject to change. Please see the submission guidelines.

Grading:

Grades are based on the reading summaries (10%, total), homework assignments (10%, 10%, 10%, 10%, 10%, 10%), midterm exam (15%) and final exam (15%).

Course Policies:

Attendance, Cheating, Laptop computers, Late homework, Recording & videotaping

Sitting In:

Approval from the instructors is required.

Syllabus:

The anticipated syllabus is below. It is subject to change.
 

Lecture

Day

Important
Events

Topic

Readings

1.

1/14

 

Course overview (pdf)

 

2.

1/16

 

Introduction to ad-hoc search: Boolean retrieval (pdf)

Ch 1

3.

1/21

HW1 out

Text representation (pdf)

Ch 4 (out of sequence)

4.

1/23

 

Search engine indexes (pdf1, pdf2)

Ch 2.0 - 2.2.1, 2.4

5.

1/28

 

Evaluation (pdf1, pdf2)

Ch 8

6.

1/30

 

Information needs and queries (pdf1, pdf2)

Ch 19.4

7.

2/4

HW1 due

Retrieval models: Vector space (pdf)

Ch 6, 7

8.

2/6

HW2 out

Retrieval models: Probabilistic models (pdf)

Ch 11

9.

2/11

 

Retrieval models: Statistical language models (pdf, pdf)

Ch 12, Zhai & Lafferty

10.

2/13

 

Retrieval models: Inference networks (pdf)

Ch 10; Metzler

11.

2/18

 

Index construction (pdf)

Ch 2.2.2 - 2.2.4 (out of sequence)

12.

2/20

 

Index construction (pdf)

Ch 3.2, 5.1, 5.3, McCreadie

13.

2/25

 

Federated and vertical search (pdf)

Si and Callan, Arguello, et al

14.

2/27

HW2 due, HW3 out

Relevance feedback (pdf) and
Search log analysis pdf)

 

15.

3/4

 

Search log analysis (pdf)

Agichtein

 

3/6

 

Midterm Exam

Sample exams: 2012, 2013

 

3/11

 

Spring Break!

 

 

3/13

 

Spring Break!

 

16.

3/18

 

Document clustering

Ch 16

17.

3/20

HW3 due

Document clustering (pdf)

Ch 17

18.

3/25

HW4 out

Collaborative filtering

Su Khoshgoftaar, AAI 2009

19.

3/27

 

Collaborative filtering (pdf)

 

20.

4/1

 

Categorization (overview)

Ch 13

21.

4/3

 

Categorization (logistic regression algorithms, convexity, regularization) (pdf)

22.

4/8

HW4 due, HW5 out

Categorization (SVM) (pdf)

Ch 15

 

4/10

 

Mid Semester Break

 

23.

4/15

 

Learning to rank (pairwise constrained optimization) (pdf)

Joachims KDD'02

24.

4/17

 

Learning to rank (list-wise constrained optimization) (pdf)

Yue SIGIR'07

25.

4/22

 HW5 due, HW6 out

Link Analysis: HITS, PageRank (pdf)

Ch 21

26.

4/24

 

Link Analysis: Personalized and Topic Sensitive PageRank (pdf)

Haveliwala WWW'02

27.

4/29

 

Principle Component Analysis and Singular Value Decomposition (pdf)

Deerwester JASIS 1990

28.

5/1

 HW6 due

Large-scale structured learning for classification (pdf)

Gopal and Yang, KDD'03

 

5/9

 8:30am -11:30am

Final Exam

Room MM A14

Sample Exam: 2012


Updated on January 17, 2014.
Jamie Callan, Yiming Yang