CMU logo

Text Analytics:

Description: Many organizations need to analyze large amounts of text to discover useful information. For example, a company may want to monitor how the public discusses its products in social media, or a forensics team may need to discover the contents of disk drives seized by law enforcement. This course provides students with an understanding of common and emerging methods of organizing, summarizing, and analyzing large collections of unstructured and lightly-structured text ('text analytics'). The focus is on algorithms and techniques, however the course also provides an introduction to open-source software tools
This is a 6 unit course. It is offered during the second half of the Fall (Mini-2) and Spring (Mini-4) semesters.
Learning Objectives: By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam.
  • Recall and discuss common methods of conducting exploratory and predictive analysis of text information;
  • Use search engines and common open-source software to perform common methods of exploratory and predictive analysis; and
  • Apply text analysis techniques discussed in class to solve problems faced by governments and companies;
Prerequisites: None
Time & Location: 2014 Fall Mini A2, M/W 4:30 - 5:50, location TBD
Instructor: Jamie Callan
Teaching Assistants: Bob Fang, TBD
Office hours: TBD
Discussion Forum: See the 95-865 discussion at You must be invited to join this forum. We will invite everyone enrolled in the class on Friday, March 28.
Instructional Materials: Online access to some materials is restricted to the domain. CMU people can get access from outside (e.g., from home) using CMU's WebVPN Service.
Recorded Lectures: Recorded lectures are available via the Heinze College video catalog. An Andrew id is required.
Homework: 3 assignments that give hands-on experience with techniques discussed in class.
Grading: 3 assignments (3 x 25%) and a final exam (25%).
Grading Scale: Grades are assigned using a curve.
Course policies: Attendance, Laptops & mobile devices, Late homework, Plagiarism & cheating Recording & videotaping
Syllabus (subject to revision):  
  1. Oct 20: Course overview and introduction to text analytics
  2. Oct 22: Exploratory analysis: Frequency and co-occurrence
  3. Oct 27: Text representation: Turning text into features
    HW1 out
  4. Oct 29: Exploratory analysis: Clustering
  5. Nov 3: Exploratory analysis: Topic models
  6. Nov 5: Predictive analysis: Categorization
    HW1 due, HW2 out
  7. Nov 10: Predictive analysis: Categorization
  8. Nov 12: Predictive analysis: Categorization
  9. Nov 17: Predictive analysis: Categorization
    Reading: Feldman
  10. Nov 19: Predictive analysis: Sentiment analysis
    HW2 due, HW3 out
  11. Nov 24: Predictive analysis: Sentiment analysis
  12. Dec 1: Case studies: Expert finding
  13. Dec 3: Case studies: E-Discovery;
    HW3 due
  14. TBD: Final exam (Sample final)

Updated on July 10, 2014
Jamie Callan