Final Review FAQ Page

Last Modified: Dec 7, 2003 11:39 PM
kct@cs.cmu.edu

1. In statistical MT we can get P(target langauge) by using language models. But how do we get P(source|target) as an example (from the slides) we have english and french E=argmax(Pe)*P(f|e)
If I understand your question, we find the argmax using some kind of stack search or dynamic programming. The actual algorithm will depend on the translation model and how complex it is. (Also see p 486-487 of Manning & Schutze.)

2. Can you elaborate on the difference between diphone synthesis and unit selection synthesis?
In unit selection the size of the synthesis unit chosen for a particular system may be one of many choices: half-phones, phones, diphones, syllables, etc. The key idea is that we get multiple examples of the same unit in different contexts (where context may be some combination of adjoining phonemes and maybe prosody features, e.g. emphasized or non-emphasized). We can cluster the examples to find representative units for acoustically different units. With diphone synthesis, we only use one example of each phone-phone transition and do not have different versions of the diphone depending on context. Unit selection systems are more difficult to build and do require more labelled data, but may produce much better quality than diphone (though not necessarily).

3. What's the difference between co-articulation and reduction?
*Co-articulation: This means moving more than one articulator at once. When you speak, there are several types of different "articulator" movements that parts of your speech system can make to form different phonemes: various tongue movements (dental, velar, etc), lips, etc. Co-articulation means that your mouth, while forming part of one phoneme with one type of articulator, is getting ready to transition to the next sound by setting up a second articulator at the same time. This may or may not have much effect on the sound being produced by the first articulator.
For example, in the word "snoozing", when your mouth is forming the "n" sound, your lips are already rounded to get ready for the upcoming "oo". Change the word to "sneezing" and you'll see that your lips get ready for the "ee" during the "n" even though the "n" sound is exactly the same for both words. This is co-articulating the "n" with the following vowel.
*Reduction: This refers to shortening or reducing the sound of a phoneme. This often happens because the "correct" sound is more difficult to articulate in some context. Usually reduction refers to a vowel, but may also happen for some consonants. A vowel reduction example is the word "for" in the phrase "What's for dinner?". The word "for" is said differently in the phrase, than if you say the word "for" all by itself. In the phrase, the "o" vowel is shorter and sounds more like "fer". This is not lazy speech, it's a natural adjustment to make your mouth's job easier.

4. What are the Transfer and Interlingua approaches to Knowledge-Based Machine Translation (KBMT)?
These are two different approaches to KBMT. The transfer approach assumes you can do things better/easier by having a different customized representation for meaning for each language pair. The interlingua approach assumes that being more general is more important by having a shared representation for all languages.

5. What is categorization-based CLIR?
For categorization-based Cross-Lingual Information Retrieval, we assign every web page (say) a set of terms from some controlled set of categories (for example, a BMW page might have the keywords: german, luxury cars, automobiles). If we have such categories for languages A and B, with a dictionary to translate between them, we can find pages in the other languages just by mapping the keywords to the other language.

6. Can you clarify the difference between example-based MT (EBMT) and statistical based MT. They seem very similar. When is one preferable over the other?
Example-based MT is a framework that relies on the assumption that, to translate string X from language A to B, we can make use of a bilingual A-B aligned training corpus. This corpus is assumed to contain some pretty good approximations (if not identical copies) of X in language A, together with corresponding target sentences in language B, which we try to combine to construct a translation of X. These combination methods are generally but not exclusively statistical. Statistical MT systems assume a broader choice of models that may or may not involve such direct use of a bilingual aligned corpus. For example, a statistical MT system might try to learn how syntax trees map from language A to B, and choose the most likely transformation path given string X. This would involve single-word dictionaries and perhaps parallel examples of sentence parses of various kinds. Example-based and statistical methods are less linguistically general in the sense that they both rely on particular corpora to extract their translation rules. However, both have the advantage that they learn automatically as opposed to having hand-crafted rules, which are expensive to create. Whether you choose EBMT or a statistical MT approach will be partially decided by the amount and nature of the training data available. Statistical MT models range from the very simple (e.g. unigram) to the very complex and the more complex the model, the more parameters need to be trained and correspondingly more training data will be needed. Example-based models assume a bilingual corpus large enough to contain close examples of the future text to be translated. Also see http://www.compapp.dcu.ie/~away/EBMT.html