Text Analytics:
CMU logo
Description: Many organizations need to analyze large amounts of text to discover useful information. For example, a company may want to monitor how the public discusses its products in social media, or a forensics team may need to discover the contents of disk drives seized by law enforcement. This course provides students with an understanding of common and emerging methods of organizing, summarizing, and analyzing large collections of unstructured and lightly-structured text ('text analytics'). The focus is on algorithms and techniques, however the course also provides an introduction to open-source software tools
This is a 6 unit course. It is offered during the second half of the Fall (Mini-2) and Spring (Mini-4) semesters.
Learning Objectives: By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam.
  • Recall and discuss common methods of conducting exploratory and predictive analysis of text information;
  • Use search engines and common open-source software to perform common methods of exploratory and predictive analysis; and
  • Apply text analysis techniques discussed in class to solve problems faced by governments and companies;
Prerequisites: None
Time & Location: 2014 Fall Mini A2, Tu/Th 4:30 - 5:50, HBH 1002
Instructor: Jamie Callan
Teaching Assistant: Ben Xiaobin He (xiaobinh@andrew)
Office hours: M/W, 10:30-11:30, HbH A020B.
Discussion Forum: A discussion forum is provided for students to ask questions, answer questions, and discuss class-related topics. You will need a Piazza account to use the discussion forum. Please provide a CMU email address when you join the 95-865 discussion (you can use other email addresses, too). We will periodically remove students that do not have CMU email addresses.
Instructional Materials: Online access to some materials is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's WebVPN Service.
Homework: 3 assignments that give hands-on experience with techniques discussed in class.
Grading: 3 assignments (3 x 25%) and a final exam (25%).
Grading Scale: Grades are assigned using a curve.
Course policies: Attendance, Laptops & mobile devices, Late homework, Plagiarism & cheating Recording & videotaping
Syllabus (subject to revision):
Date Topic Reading
Oct 21 Course overview and introduction to text analytics (pdf)  
Oct 23 Text representation: Turning text into features (pdf)  
Oct 28 Exploratory analysis: Frequency and co-occurrence (pdf)  
Oct 30 Exploratory analysis: Co-occurrence and clustering (pdf)
HW1 out
Nov 4 Exploratory analysis: Clustering (pdf)  
Nov 6 Predictive analysis: Categorization (pdf)
HW1 due, HW2 out
Nov 11 Predictive analysis: Categorization (pdf)  
Nov 13 Predictive analysis: Categorization (pdf)  
Nov 18 Predictive analysis: Sentiment analysis (pdf) Feldman
Nov 20 Predictive analysis: Sentiment analysis (pdf)
HW2 due, HW3 out
Nov 25 Tools: Search engines as language databases (pdf)  
Dec 2 Case studies: Expert finding  
Dec 4 Case studies: E-Discovery;
HW3 due
Dec 11 Final exam, 8:30am, HBH 1004 Sample final

Copyright 2014, Carnegie Mellon University.
Updated on November 25, 2014
Jamie Callan