|
|
Search Engines and Web Mining
|
|
Description: |
This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course studies text search engines for enterprise and Web environments; the open-source Lucene and Indri search engines are used as working examples. The second half studies text mining techniques such as recommender systems, clustering, and categorization. Programming assignments give hands-on experience with document ranking, evaluation, categorizing documents into browsing hierarchies, and related topics. |
|
Eligibility: |
This course is open to all students who meet the pre-requisites except students in the LTI's MLT and PhD programs. Students in the LTI's MLT and PhD programs can take 11-741, Information Retrieval, which focuses more on research. This course focuses more on current practice. |
|
Prerequisites: |
Prerequisites: 15-211, Fundamental Data Structures
and Algorithms. 21-241, Matrix Algebra or 21-341, Linear Algebra. |
|
Time & Location: |
Tu/Th, 12:00-1:20, GHC 4215 |
|
Instructor(s): |
|
|
Teaching Assistant: |
|
|
Instructional Materials: |
The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. The textbook can be purchased at the CMU book store. There are selected additional readings, which are available online or placed on reserve in the Engineering and Science Library, 4th floor, Wean Hall. Online access to some materials is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's WebVPN Service. |
|
Homework: |
Homework consists of programming projects and/or problem sets. |
|
Grading: |
60% homework (6 programming), 20% midterm, 20% final. |
|
Course policies: |
|
|
Syllabus: |
|
|
Updated on May 30, 2013
Jamie Callan and Yiming Yang