Human Language Technologies
("Words for Nerds")

Description: During the last decade computers have begun to understand human languages. Web search engines, language analysis programs, machine translation systems, speech recognition, and speech synthesis are used every day by tens of millions of people in a wide range of situations and applications. This course covers the fundamental statistical and symbolic algorithms that enable computers to work with human language, from text processing to understanding speech and language. It provides detailed coverage of current techniques, their successes, their limitations, and current research directions. Homework assignments give hands-on experience with four different language technologies: document ranking algorithms for Web search, grammars for language understanding, learning machine translation models, and tuning a speech synthesis system to sound as natural as possible.
Instructor(s): Jamie Callan, Alan Black, and Alon Lavie
Teaching Assistant: Brian Langner
Prerequisites: 15-211 and 15-212 are prerequisites for SCS undergraduates.
Availability: Open to juniors and seniors in the SCS undergraduate program, and graduate students in a CMU Master's degree program. Open to other students with the consent of an instructor. LTI Ph.D. students are welcome to audit but may not take the course for credit.
Materials: The text required for the course will be Jurafsky & Martin's, "Speech and Language Processing", Prentice Hall (ISBN 0-13-095069-6). You may buy the current edition at the bookstore. An online draft of a new edition is also available. You may use whichever edition you want, but if you use the new edition, i) be aware that it is incomplete, and may not contain all of the sections assigned in class, and ii) you are responsible for mapping old section & page numbers to the new edition.

There will also be selected additional readings, which will be available online or placed on reserve in the Engineering and Science Library, 4th floor, Wean Hall.

Online access to some materials is restricted to the .cmu.edu domain. CMU people can get access from outside .cmu.edu (e.g., from home) using CMU's IP Extension Service.

Homework: Homework consists of two components: Weekly brief reading assignments and four programming projects.
Grading: 10% class participation, 40% programming projects, 10% readings homework, 20% midterm, 20% final.
Course policies: Late homework , Cheating
Time: Tues/Thurs 3:00-4:20
Location: NSH 3002
Syllabus, 2005:
Lecture Important
Subject Who
Automatic Processing of Language (1 lecture)
1. 8/29 Course Overview (pdf)
Readings: M&S 1.0-1.3
Information Retrieval (7 lectures)
2. 8/31 Introduction to IR: Tasks, text representation and indexing (pdf)
Readings: M&S 1.4, 2.2.3, 5.0-5.2, 5.4-5.5
Remedial readings (optional): M&S 2.0-2.2.2
3. 9/5 Text representation and indexing (pdf)
Readings: M&S 4.2-4.4
4. 9/7 Search engines (pdf)
Readings: M&S 15.0-15.2
5. 9/12 Search engines (pdf) jpc
6. 9/14 HW1 out Search evaluation (pdf) jpc
7. 9/19 Clustering, k-NN text classification (pdf)
Readings: (You read these last week - don't turn in this week)
M&S 14.0-14.2.1, 16.0-16.1, 16.4-16.5
8. 9/21 Open-domain question answering (pdf)
Readings: Prager, SIGIR '00; Voorhees, TREC-9
Natural Language Processing (8 lectures)
9. 9/26 Information extraction (pdf)
Readings: Riloff and Schmelzenbach; Cardie, AI Magazine, 1997
10. 9/28 Formal languages (pdf)
Required readings: J&M Chapter 2, 9.0-9.2, 9.11
11. 10/3 HW1 due Subsentential processing (pdf)
Readings: J&M Chapter 8
12. 10/5 Parsing I: CFG Parsing (pdf)
Readings: J&M 10.0-10.4
13. 10/10 HW2 out Semantics (pdf)
Readings: J&M 14.1-14.3
14. 10/12 Parsing II: Unification (pdf)
Readings: J&M 11.0-11.2, 11.4-11.5
sw for al
15. 10/17 Ambiguity resolution (pdf)
Readings: J&M 12.0-12.3
16. 10/19 Natural language generation (ppt)
Readings: Reiter & Dale 1997
17. 10/24 Midterm exam
Study materials: 2003 midterm + answers
Multi-Language Applications (6 lectures)
18. 10/26 Machine translation history: Overview of approaches (pdf)
Readings: J&M Chapter 21
19. 10/31 HW2 due
HW3 out
Building MT Resources (pdf) awb
20. 11/2   Statistical MT and Example-Based MT (pdf)
Readings: M&S 13.0-13.1.1, 13.1.5-13.4
21. 11/7 Knowledge-Based MT (ppt) al
22. 11/9 MT Evaluation (ppt)
Readings: Papineni, et al, ACL-02
23. 11/14 AVENUE: Learning-based MT Approaches for Languages with Limited Resources (pdf)
Speech-to-Speech Translation (pdf)
Readings: Probst, et al, MTJ 17(4)
Speech (6 lectures)
24. 11/16 HW3 due
(Nov 19)
Introduction to speech (pdf) awb
25. 11/21 HW4 out Spoken dialog systems (pdf) awb
26. 11/28 Speech synthesis, part I (ppt)
Readings: J&M 4.1, 4.6, 4.7, 4.9, 7.8
27. 11/30 Speech synthesis, part II (pdf) awb
28. 12/5 Speech recognition, part I (pdf)
Readings: J&M Chapter 7 (except 7.8)
29. 12/7 HW4 due Speech recognition, part II (pdf) awb
      12/11 Final Exam, 5:30-8:30pm, Porter Hall A18A
Review FAQ from 2003
2004 final
Readings Key:
J&M: Jurafsky & Martin. "Speech and Language Processing".
M&S: Manning and Schutze. "Foundations of Statistical NLP".
Faculty Key:
awb: Prof. Alan Black
jpc: Prof. Jamie Callan
al: Prof. Alon Lavie

Updated on September 27, 2006