|
|
Special Topic: Search Engines and
Web Mining
|
|
Description: |
This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course studies text search engines for enterprise and Web environments; the open-source Indri search engine is used as a working example. The second half studies text mining techniques such as clustering, categorization, and information extraction. Programming assignments give hands-on experience with document ranking algorithms, categorizing documents into browsing hierarchies, and related topics. |
|
Prerequisites: |
Prerequisites: 15-211, Fundamental Data Structures
and Algorithms. 21-241, Matrix Algebra or 21-341, Linear Algebra. |
|
Time & Location: |
Tu/Th 12:00 - 1:20, Wean Hall 5310 |
|
Instructor(s): |
|
|
Teaching Assistant: |
|
|
Instructional Materials: |
The textbook is Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. The textbook can be purchased at the CMU book store. There are selected additional readings, which are available online or placed on reserve in the Engineering and Science Library, 4th floor, Wean Hall. Online access to some materials is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's WebVPN Service. |
|
Homework: |
Homework consists of programming projects and/or problem sets. |
|
Grading: |
50% homework (2 programming, 2 written), 10% quizzes, 20% midterm, 20% final. |
|
Course policies: |
|
|
Syllabus: |
|
HW3 out, due Nov11 (Part 1) and Nov 19 (Part 2).
|
|
|
Updated on September 29, 2009
Jamie Callan and Yiming Yang