Search Engines:
11-442 / 11-642 / 11-742
 
CMU logo
 

Fall 2023

Course
Description:
This lecture-oriented course studies the theory, design, and implementation of text-based search engines. The core components include statistical characteristics of text, representation of information needs and documents, several important retrieval models, and experimental evaluation. The course also covers common elements of commercial search engines, for example, integration of diverse search engines into a single search service (federated search, vertical search), personalized search results, and diverse search results. The software architecture components include design and implementation of large-scale, distributed search engines.

This is a full-semester course. The graduate sections (11-642 and 11-742) are worth 12 units. The undergraduate section (11-442) is worth 9 units.

The main difference between the three sections (11-442, 11-642, 11-742) is the amount of analysis, writing, and time required to complete homework assignments.
Learning Objectives: By the end of the course, students are expected to have developed the skills listed below.
  • Recall and discuss well-known search engine architectures, methods of representing text documents, methods of representing information needs, and methods of evaluating search effectiveness;
  • Implement well-known retrieval algorithms and test them on standard datasets; and
  • Apply information retrieval techniques discussed in class to solve problems faced by governments and companies.

Skills are assessed by the homework assignments; and by exams.

Eligibility: This course is open to all students who meet the prerequisites.
Prerequisites: This course requires good programming skills and an understanding of computer architectures and operating systems (e.g., memory vs. disk trade-offs). A basic understanding of probability, statistics, and linear algebra is helpful. Thus students should have preparation comparable to the following CMU undergraduate courses.
  • 15-210, Parallel and Sequential Data Structures and Algorithms (required)
  • 15-213, Introduction to Computer Systems (required)
  • 15-451, Algorithm Design and Analysis (helpful)
  • 21-241, Matrices and Linear Transformations or 21-341, Linear Algebra (required)
  • 21-325, Probability (required)
  • 36-202, Methods for Statistics & Data Science (helpful)
Homework assignments are done in the Python programming language, thus students must also have good Python programming skills.
Time &
Location:
Tu/Th 9:30 AM - 10:50 AM, Location MI 348.
Enter from the Bellefield St entrance. The classroom is immediately on the right, before security.
Instructor: Jamie Callan
Teaching Assistants:
Priya Bagaria(pbagaria@andrew)
Garvit Gupta(garvitg@andrew)
Raghav Kapoor (raghavka@andrew)
Siddharth Basu (sbasu4@andrew)
Office hours: Office Hours begin September 5
Monday 12:00 - 1:30 TCS 349 Raghav
Tuesday 11:00 - 12:30 Wean 3110 Siddharth
Wednesday 4:00 - 5:30 GHC 5417 Priya
Friday 4:30 - 6:00 Wean 3110 Garvit
Course
Materials:
Lecture Slides: Copies of the lecture slides are posted on this page, usually within 24 hours.

Lecture Recordings: Lecture recordings are available for students that need to miss a lecture or two due to illness or travel. Contact the instructor to obtain a link for the lecture(s) that you need. Recordings are access-controlled. Use your Andrew ID and password to login to YouTube.

Lecture recordings may unintentionally include your voice. Please alert the instructor if this is a problem for you.

Textbook: The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. You may use the printed copy or the online copy, but note that the reading instructions refer to the printed copy.

Readings: There are additional selected readings, which will be available through the class web page (this page).

Piazza: A discussion forum is provided for students to ask questions, answer questions, and discuss class-related topics. The TAs monitor Piazza 11am-7pm M-F, and 3-7pm on the weekends. You must register yourself to access the discussion forum. Please provide a CMU email address when you join the discussion (you can use other email addresses, too). We will periodically remove people that do not have CMU email addresses.

Homework Services: A Homework Services web page provides information about your homework submissions and access to graded homework reports. Each individual homework has its own web pages that describe the assignment and provide access to automated testing services.

Restricted access: Online access to some materials (additional readings, lecture notes, datasets, etc) is restricted to CMU people. Students on CMU local and virtual private networking IP addresses have direct access. Other students can gain access using a password.
Homework: 5 assignments that give hands-on experience with techniques discussed in class. Homework must be done individually, and students may not share their work with other students. See the course Academic Integrity policy for more information.
Grading: 5 homework assignments (12% each, 60% total), midterm exam (20%), final exam (20%).
Grading
Scale:
Grades are assigned using a curve. Typically the median GPA is about 3.5.
Course
policies:
Academic Integrity, Attendance, Auditing, Laptops & mobile devices, Late homework, Pass/Fail, Waitlist
Syllabus:
Date Topic Readings
Aug 29, Course overview (pdf, mp4)
Aug 31, Introduction to search: Exact-match retrieval (pdf, mp4) Ch 1, Ch 5.1.1 - 5.1.2
Sep 4, HW1 out  
Sep 5, Introduction to search: QryEval (pdf, mp4) Ch 2.4.2
Sep 7, Introduction to search: Query processing (pdf, mp4) Ch 8-8.4 (early reading)
Sep 8, 3:00-4:30, HW1 support (optional)
300 S. Craig St, Room 265
 
Sep 12, Evaluating search effectiveness (pdf, mp4) Ch 8.5
Sep 14, Best-match retrieval: VSM, BM25 (pdf, mp4) Ch 6.2-6.4.2, 11.4.3
Sep 18, HW1 due, HW2 out  
Sep 19, Best-match retrieval: Language models (pdf, mp4)
HW2 implementation (pdf)
Ch 12.2-12.4
Sep 21, Query structure: Information needs and queries (pdf, )
Document representation (
pdf)
Nguyen & Callan, 2011
Sep 26, Document representation Ch 4, Ch 5.3-5.3.1, Ch 7.1.3
Sep 28, Query structure: Relevance and pseudo relevance feedback Ch 2.2
Oct 2, HW2 due, HW3 out  
Oct 3, Feature-based ranking models Li, 2011
Oct 5, Feature-based ranking models, Authority metrics,
Large-scale indexing
Ch 21
Oct 10, Index creation Ch 9-9.2.2
Oct 12, Midterm exam Sample Midterm
Oct 24, Large-scale indexing  
Oct 26, Search logs Ch 10-10.3
Oct 30, HW3 due, HW4 out  
Oct 31, Search logs
Nov 2, Diversity Santos, et al., 2010
Nov 9, Diversity Dang & Croft, 2012
Nov 13, HW4 due, HW5 out  
Nov 14, Personalization Bennett et al, 2012, Eickhoff et al, 2014
Nov 16, Neural ranking models  
Nov 21, Neural ranking models  
Nov 27, HW5 due  
Nov 28, Neural ranking models Dai & Callan, 2019b
Nov 30, Neural ranking models
Dec 5, Neural ranking models  
Dec 7, Final exam Sample Final
Accommodations for COVID-19: If you are unable to attend class: Contact the instructor to receive a link to a recording of the lecture.

If the instructor is unable to attend class: Students will be notified via email from Canvas that the lecture will be delivered via Zoom or replaced by a recorded lecture from last semester.
Accommodations for Students with Disabilities: If you have a disability and are registered with the Office of Disability Resources, I encourage you to use their online system to notify me of your accommodations and discuss your needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.
If You Are Having
Difficulty:
If you are having difficulty in any of your courses, please consider reaching out to the Student Academic Success Center (SASC). SASC provides the following services.
  • Individual and small group coaching on the development of successful learning habits such as time management, stress reduction, and other skills (Academic Coaching).
  • Consultation for multilingual and international students (Language and Cross-Cultural Support).
  • Individual consultations and workshops to support excellence in communication of written texts, oral presentations, and data visualization (Communication Support).
Advice From
The Faculty:
This course is a lot of work. Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

If you find yourself struggling with the material or workload, please ask for help. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at https://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.


Copyright 2023, Carnegie Mellon University.
Updated on September 21, 2023

Jamie Callan