Search Engines:
11-442 / 11-642
 
CMU logo
 
Description: This course studies the theory, design, and implementation of text-based search engines. The core components include statistical characteristics of text, representation of information needs and documents, several important retrieval models, and experimental evaluation. The course also covers common elements of commercial search engines, for example, integration of diverse search engines into a single search service ("federated search", "vertical search"), personalized search results, diverse search results, and sponsored search. The software architecture components include design and implementation of large-scale, distributed search engines.

This is a full-semester lecture-oriented course worth 12 units.
Learning Objectives: By the end of the course, students are expected to have developed the skills listed below.
  • Recall and discuss well-known search engine architectures, methods of representing text documents, methods of representing information needs, and methods of evaluating search effectiveness;
  • Implement well-known retrieval algorithms and test them on standard datasets; and
  • Apply information retrieval techniques discussed in class to solve problems faced by governments and companies.
Skills are assessed by the homework assignments; and by midterm and final exams.
Eligibility: This course is open to all students who meet the prerequisites.
Prerequisites: This course requires good programming skills and an understanding of computer architectures and operating systems (e.g., memory vs. disk trade-offs). A basic understanding of probability, statistics, and linear algebra is helpful. Thus students should have preparation comparable to the following CMU undergraduate courses.
  • 15-210, Parallel and Sequential Data Structures and Algorithms (required)
  • 15-213, Introduction to Computer Systems (required)
  • 15-451, Algorithm Design and Analysis (helpful)
  • 21-241, Matrix Algebra or 21-341, Linear Algebra (required)
  • 21-325, Probability (required)
  • 36-202, Basic statistics (helpful)
Time & Location: Tu/Th 1:30-2:50, PH 100
Instructor: Jamie Callan
Teaching Assistants:
Hongyu Li (hongyul@andrew)
Qing Liu (qingl2@andrew)
Vallari Mehta (vallarim@andrew)
Arpita Pyreddy (mpyreddy@andrew) Handles reading summaries
Ye (Charlotte) Qi (yeq@andrew)
Anshu Rajendra (anshur@andrew)
Varshini Ramaseshan(vramases@andrew) Handles reading summaries
Saksham Singhal(sakshams@andrew)
Office hours:
Day Time Location TA
Monday 1:00-2:30
4:30-6:00
GHC 5417
GHC 6404
Saksham
Anshu
Tuesday 5:30-7:00 GHC 6404 Ye
Thursday 4:30-6:00 GHC 5417 Vallari
Friday 5:00-6:30 GHC 5417 Qing
Instructional Materials: The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. You may use the printed copy or the online copy, but note that the reading instructions refer to the printed copy.

There are additional selected readings, which will be available through the class web page (this page).

Online access to some materials (additional readings, lecture notes, datasets, etc) is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's WebVPN Service.

A discussion forum is provided for students to ask questions, answer questions, and discuss class-related topics. You must register yourself to access the discussion forum. Please provide a CMU email address when you join the discussion (you can use other email addresses, too). We will periodically remove students that do not have CMU email addresses.
Homework: 5 assignments that give hands-on experience with techniques discussed in class.
Grading: Weekly reading summaries (10% total), 5 homework assignments (10% each, 50% total), midterm exam (20%), final exam (20%).
Grading Scale: Grades are assigned using a curve.
Course policies: Attendance, Auditing, Laptops & mobile devices, Late homework, Pass/Fail, Plagiarism & cheating, Recording & videotaping, Waitlist
Syllabus
(subject to revision):
Date Topic Readings
Jan 17, Course overview (pdf)
Jan 19, Introduction to search: Exact-match retrieval (pdf)
Reading summaries (pdf)
Ch 1, Ch 5.1
Jan 24, Introduction to search: Query processing (pdf)
HW1 out
Ch 2.4
Jan 26, Introduction to search: Query processing (pdf)
Software development guidelines (pdf)
 
Jan 31, Evaluating search effectiveness (pdf) Ch 8-8.5
Feb 2, Evaluating search effectiveness (pdf)  
Feb 7, Document representation (pdf)
HW1 due, HW2 out
Ch 2-2.2
Feb 9, Best-match retrieval: VSM, BM25 (pdf) Ch 6, Ch 11
Feb 14, Best-match retrieval: Language models (pdf) Ch 12
Feb 16, Query structure: Information needs and queries (pdf) Nguyen & Callan, 2011
Feb 21, Query structure: Relevance and pseudo relevance feedback (pdf)
HW2 due, HW3 out
Ch 9
Feb 23, Index creation (pdf) Ch 4
Feb 28, Index creation (pdf)
Document priors (pdf)
Ch 7
Mar 2, Index creation (pdf)
Mar 7, Document structure (pdf) Ch 10
Mar 9, Midterm Exam Sample Midterm 1, Sample Midterm 2
Mar 21, Ranked retrieval: Feature-based models (pdf)
HW3 due, HW4 out
Clarke Ch 11.7; Li, 2011
Mar 23, Authority metrics (pdf) Ch 21
Mar 28, Page quality, web spam (pdf) Santos, Ch 1-5
Mar 30, Diversity (pdf) Santos, Ch 6-7
Apr 4, Diversity (pdf)
HW4 due, HW5 out
 
Apr 6, Search log analysis (pdf) Eickhoff et al, 2014
Apr 11, Search log analysis (pdf)  
Apr 13, Personalization (pdf) Bennett et al, 2012
Apr 18, Federated, aggregated, & vertical search (pdf)
HW5 due
Si & Callan, 2003
Apr 25, Federated, aggregated, & vertical search (pdf) Arguello & Diaz, 2013
Apr 27, No class  
May 2, Enterprise search (pdf)  
May 4, Final exam Sample final
Advice From The Faculty:

This course is a lot of work. Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

If you find yourself struggling with the material or workload, please ask for help. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.


Copyright 2017, Carnegie Mellon University.
Updated on May 02, 2017
Jamie Callan