Text Analytics:
95-865 (A)
CMU logo
Description: Many organizations need to analyze large amounts of text to discover useful information. For example, a company may want to monitor how the public discusses its products in social media, or a forensics team may need to discover the contents of disk drives seized by law enforcement. This course provides students with an understanding of common and emerging methods of organizing, summarizing, and analyzing large collections of unstructured and lightly-structured text ('text analytics'). The focus is on algorithms and techniques, however the course also provides an introduction to open-source software tools
This is a 6 unit course. It is offered during the second half of the Fall (Mini-2) and Spring (Mini-4) semesters.
Learning Objectives: By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam.
  • Recall and discuss common methods of conducting exploratory and predictive analysis of text information;
  • Use search engines and common open-source software to perform common methods of exploratory and predictive analysis; and
  • Apply text analysis techniques discussed in class to solve problems faced by governments and companies;
Prerequisites: None
Time & Location: Fall Mini A2, Tu/Th 4:30 - 5:50, HBH 1002
Instructor: Jamie Callan
Teaching Assistants: Evan Leibowitz (evanleibowitz@cmu)
Jaimie Stein (jlstein@andrew)
Office Hours:
Day Time Location TA
Monday 4:30-6:00 HbH Rotunda Evan
Friday 11:00-12:30 HbH 2007B Jaimie
Discussion Forum: A discussion forum is provided for students to ask questions, answer questions, and discuss class-related topics. You will need a Piazza account to use the discussion forum. Please provide a CMU email address when you join the 95-865 discussion (you can use other email addresses, too). We will periodically remove students that do not have CMU email addresses.
Instructional Materials: Some lectures have assigned readings from Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. The links next to each lecture provide access to an online version of the text.

Some lectures have assigned readings from other papers, as shown in the link next to the lecture.

Online access to some materials is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's WebVPN Service.
Homework: 3 assignments that give hands-on experience with techniques discussed in class.
Grading: 3 assignments (3 x 25%) and a final exam (25%).
Grading Scale: Grades are assigned using a curve.
Course policies: Attendance, Auditing, Laptops & mobile devices, Late homework, Pass/fail, Plagiarism & cheating, Recording & videotaping, Waitlist
Syllabus (subject to revision):
Date Topic Reading
Oct 25 Course overview and introduction to text analytics (pdf)  
Oct 27 Exploratory analysis: Frequency analysis (pdf)
HW1 out
Ch 2.0 - 2.2
Nov 1 Exploratory analysis: Co-occurrence analysis (pdf)  
Nov 3 Exploratory analysis: Clustering (pdf)
Homework guidelines (pdf)
Ch 16
Nov 8 Exploratory analysis: Clustering (pdf)
HW1 due
Ch 17
Nov 10 Predictive analysis: Categorization (pdf)
HW2 out
Ch 14.0-14.3
Nov 15 Predictive analysis: Categorization (recorded lecture) (pdf) Ch 13
Nov 17 Predictive analysis: Categorization (recorded lecture) (pdf) Ch 15.0-15.3
Nov 22 Predictive analysis: Sentiment analysis (pdf)
HW2 due, HW3 out (docx, pdf)
Nov 29 Predictive analysis: Sentiment analysis (pdf)  
Dec 1 Tools: Search engines as language databases (pdf)  
Dec 6 Case studies: Expert finding (pdf)
Review (pdf)
HW3 due
Dec 8 Final exam Sample final 1,
Sample final 2
Advice From The Faculty:

This course is a lot of work. Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

If you find yourself struggling with the material or workload, please ask for help. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

Copyright 2016, Carnegie Mellon University.
Updated on December 06, 2016
Jamie Callan