Search Engines:
11-442 / 11-642 / 11-742
 
CMU logo
 

Fall 2020

Course
Description:
This lecture-oriented course studies the theory, design, and implementation of text-based search engines. The core components include statistical characteristics of text, representation of information needs and documents, several important retrieval models, and experimental evaluation. The course also covers common elements of commercial search engines, for example, integration of diverse search engines into a single search service (federated search, vertical search), personalized search results, and diverse search results. The software architecture components include design and implementation of large-scale, distributed search engines.

This is a full-semester course. The graduate sections (11-642 and 11-742) are worth 12 units. The undergraduate section (11-442) is worth 9 units.

The main difference between the three sections (11-442, 11-642, 11-742) is the amount of analysis, writing, and time required to complete homework assignments.
Learning Objectives: By the end of the course, students are expected to have developed the skills listed below.
  • Recall and discuss well-known search engine architectures, methods of representing text documents, methods of representing information needs, and methods of evaluating search effectiveness;
  • Implement well-known retrieval algorithms and test them on standard datasets; and
  • Apply information retrieval techniques discussed in class to solve problems faced by governments and companies.
Skills are assessed by the homework assignments; and by quizzes.
Eligibility: This course is open to all students who meet the prerequisites.
Prerequisites: This course requires good programming skills and an understanding of computer architectures and operating systems (e.g., memory vs. disk trade-offs). A basic understanding of probability, statistics, and linear algebra is helpful. Thus students should have preparation comparable to the following CMU undergraduate courses.
  • 15-210, Parallel and Sequential Data Structures and Algorithms (required)
  • 15-213, Introduction to Computer Systems (required)
  • 15-451, Algorithm Design and Analysis (helpful)
  • 21-241, Matrices and Linear Transformations or 21-341, Linear Algebra (required)
  • 21-325, Probability (required)
  • 36-202, Methods for Statistics & Data Science (helpful)
Homework assignments are done in the Java programming language, thus students must also have good Java programming skills. Assignments will be done using Java 11.
Time &
Location:
Tu/Th 9:50 AM - 11:10 AM, delivered via Zoom
Jamie will stay for 15 minutes after class to answer questions
Zoom instructions for this class, Zoom invitation (requires VPN or login to read), general Zoom instructions
Instructor: Jamie Callan
Teaching Assistants:
Linwei Henry Li (linweil@andrew)
Sharanya Chakravarthy (sharanyc@andrew)
Vinay Damodaran (vdamodar@andrew)
Yizhen Ma (yizhenm@andrew)
Yu-Ning Rebecca Huang (yuninghu@andrew)
Ziqi Deng (ziqideng@andrew)
Office hours: All times are in the Pittsburgh timezone (EDT, UTC -4).
Monday 3:00p Ziqi Zoom
Tuesday 1:00p Henry Zoom
Wednesday 1:00p Yizhen Zoom
Thursday 9:00p Sharanya Zoom
Friday 4:00p Rebecca Zoom
Course
Materials:
Lectures: Recorded copies of lectures will be available through YouTube within 24 hours of the live lecture. Log into https://www.youtube.com/ using your Andrew ID and password.

Textbook: The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. You may use the printed copy or the online copy, but note that the reading instructions refer to the printed copy.

Readings:There are additional selected readings, which will be available through the class web page (this page).

Piazza: A discussion forum is provided for students to ask questions, answer questions, and discuss class-related topics. You must register yourself to access the discussion forum. Please provide a CMU email address when you join the discussion (you can use other email addresses, too). We will periodically remove students that do not have CMU email addresses.

Restricted access: Online access to some materials (additional readings, lecture notes, datasets, etc) is restricted to CMU people. Students on CMU local and virtual private networking IP addresses have direct access. Other students can gain access using a password.
Homework: 6 assignments that give hands-on experience with techniques discussed in class. Homework must be done individually, and students may not share their work with other students. See the course Academic Integrity policy for more information.
Graded reports can be downloaded from the Homework Reports Service.
Grading: Weekly reading summaries (12% total), 6 homework assignments (11% each, 66% total), 6 quizzes (5⅔% each, 22% total)
Grading
Scale:
Grades are assigned using a curve. Typically the median GPA is about 3.5.
Course
policies:
Academic Integrity, Attendance, Auditing, Late homework, Pass/Fail, Waitlist
Syllabus:
 
Date Topic Readings
Sep 1, Course overview (pdf, mp4)
Sep 3, Introduction to search: Exact-match retrieval (pdf, mp4) Ch 1, Ch 5.1.1 - 5.1.2
Sep 8, Introduction to search: Query processing (pdf, mp4)
Software development requirements (pdf)
HW1 out
Ch 2.4.2
Sep 10, Introduction to search: QryEval (pdf, mp4) Ch 8-8.4 (early reading)
Sep 11, 3:00p HW1 Recitation (optional)
(mp4)
 
Sep 15, Evaluating search effectiveness (pdf, mp4) Ch 8.5
Sep 17, Best-match retrieval: VSM, BM25 (pdf, mp4) Ch 6.2-6.4.2, 11.4.3
Sep 22, Best-match retrieval: Language models (pdf, mp4)
HW1 due, HW2 out
Ch 12.2-12.4
Sep 24, Query structure: Information needs and queries,
and HW2 implementation (pdf, mp4)
Nguyen & Callan, 2011
Sep 29, Document representation and Document priors (pdf, mp4)
Practice Quiz
 
Oct 1, Document representation (pdf, mp4) Ch 2.2
Oct 6, Query structure: Relevance and pseudo relevance feedback (pdf, mp4)
HW2 due, HW3 out
Ch 9-9.2.2
Oct 8, Index creation (pdf, mp4) Ch 4
Oct 13, Large-scale indexes (pdf, mp4)
Reading summaries (pdf, mp4)
Quiz 2
Ch 5.3-5.3.1, Ch 7.1.3
Oct 15, Document structure (pdf, mp4)
Oct 20, Document structure (pdf, mp4)
HW3 due, HW4 out
Ch 10-10.3
Oct 22, Ranked retrieval: Feature-based models (pdf, mp4) Li, 2011
Oct 27, Ranked retrieval: Feature-based models (pdf, mp4)
Authority metrics (pdf)
Quiz 3
Ch 21
Oct 29, Diversity (pdf, mp4)
 
Nov 2, HW4 due (Monday deadline)  
Nov 3, Class cancelled for election day
(Reading summaries are due Thursday this week)
HW5 out
Nov 5, Diversity (pdf, mp4) Santos, et al., 2010, Dang & Croft, 2012
Nov 10, Ranked retrieval: Neural models (pdf, mp4)
Quiz 4
 
Nov 12, Ranked retrieval: Neural models (pdf, mp4)
Web spam (pdf)
Guo, et al, 2016
Nov 17, Ranked retrieval: Neural models (pdf, mp4)
HW5 due, HW6 out
Dai & Callan, 2019a, Dai & Callan, 2019b
Nov 19, Search log analysis (pdf, mp4)  
Nov 24, Search log analysis (pdf, mp4)
Quiz 5
Eickhoff et al, 2014
Dec 1, Personalization Bennett et al, 2012
Dec 2, HW6 due (this is a one day extension)  
Dec 3, Evaluating search effectiveness  
Dec 8, Federated, aggregated, & vertical search Arguello & Diaz, 2013, Ch 1 - 1.3.1
Dec 10, Enterprise search
Quiz 6
 
Accommodations for Students with Disabilities: If you have a disability and are registered with the Office of Disability Resources, I encourage you to use their online system to notify me of your accommodations and discuss your needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.
Advice From
The Faculty:
This course is a lot of work. Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

If you find yourself struggling with the material or workload, please ask for help. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at https://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.


Copyright 2020, Carnegie Mellon University.
Updated on November 24, 2020

Jamie Callan