CMU logo

Text Analytics:

Description: Many organizations need to analyze large amounts of text to discover useful information. For example, a company may want to monitor how the public discusses its products in social media, or a forensics team may need to discover the contents of disk drives seized by law enforcement. This course provides students with an understanding of common and emerging methods of organizing, summarizing, and analyzing large collections of unstructured and lightly-structured text ('text analytics'). The focus is on algorithms and techniques, however the course also provides an introduction to open-source software tools
This is a 6 unit course. It is offered during the second half of the Fall (Mini-2) and Spring (Mini-4) semesters.
Learning Objectives: By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam.
  • Recall and discuss common methods of conducting exploratory and predictive analysis of text information;
  • Use search engines and common open-source software to perform common methods of exploratory and predictive analysis; and
  • Apply text analysis techniques discussed in class to solve problems faced by governments and companies;
Prerequisites: None
Time & Location: 2014 Spring Mini A4, Tu/Th 10:30 - 11:50, HBH 1004
2014 Fall Mini A2, M/W 4:30 - 5:50, location TBD
Instructor: Jamie Callan
Teaching Assistants: Bob Fang, Chloe Siyu Pan
Office hours: Section A4 (Pittsburgh): Tuesday 3:30 - 4:30pm, Thursday 3:30 - 4:30pm, Friday 3:30 - 4:30pm. HBH A115.
Section K4 (Adelaide): Wednesday 9 - 11am, Saturday 11am - noon (Adelaide timezone). Contact Chloe in a Google Hangout ( or on Skype (amemori-p).
Schedule changes are posted on the discussion forum.
Discussion Forum: See the 95-865 discussion at You must be invited to join this forum. We will invite everyone enrolled in the class on Friday, March 28.
Instructional Materials: Online access to some materials is restricted to the domain. CMU people can get access from outside (e.g., from home) using CMU's WebVPN Service.
Recorded Lectures: Recorded lectures are available via the Heinze College video catalog. An Andrew id is required.
Homework: 3 assignments that give hands-on experience with techniques discussed in class.
Grading: 3 assignments (3 x 25%) and a final exam (25%).
Grading Scale: Grades are assigned using a curve.
Course policies: Attendance, Laptops & mobile devices, Late homework, Plagiarism & cheating Recording & videotaping
Syllabus (subject to revision):  
  1. Mar 18: Course overview and introduction to text analytics (pdf)
  2. Mar 20: Exploratory analysis: Frequency and co-occurrence (pdf)
  3. Mar 25: Text representation: Turning text into features (pdf)
    HW1 out
  4. Mar 27: Exploratory analysis: Clustering (pdf)
  5. Apr 1: Exploratory analysis: Topic models (pdf)
  6. Apr 3: Predictive analysis: Categorization (pdf)
    HW1 due, HW2 out
  7. Apr 8: Predictive analysis: Categorization (pdf)
  8. Apr 10: Predictive analysis: Categorization (pdf)
  9. Apr 15: Predictive analysis: Categorization (pdf)
    Reading: Feldman
  10. Apr 17: Predictive analysis: Sentiment analysis (pdf)
    HW2 due, HW3 out
  11. Apr 22: Predictive analysis: Sentiment analysis
  12. Apr 24: Case studies: Expert finding
  13. Apr 29: Case studies: E-Discovery
  14. May 1: Final exam (in class) (Sample final)
    HW3 due

Updated on December 20, 2013
Jamie Callan