Can we include arbitrary LISP code in the grammar?
Answer: No. The grammar file is eventually processed and included in the execution of the LISP-based parser, but you must only use the grammar description format described in the homework.
What was the difference between '=' and '=c'?
Answer: The '=' is the unification operator, and will actually cause the unification
to be performed between feature strucutures.
The '=c' can be thought of as a 'check-unification' operator. It merely
verifies that the stated relation occurs, but does _not_ perform unification
itself. If the check fails, then the unification for that rule-set fails.
'=c' should be used when checking the value of a feature that is known to have been already assigned. The place '=c' is most useful is in the lexical grammar rules that interface with English morphology such as:
(<N> <--> (%)
((x0 <= (parse-word (x1 value)))
((x0 cat) =c N)))
The function "parse-word" is designed to return a feature-structure
with a feature "cat" that has already been assigned a value. The
rule just needs to verify that this value is of the expected POS
(in this case, a noun).The normal '=' operator should be used in most other cases. In particular, it should be used for unifying two feature paths, such as in:
((x0 title) = (x4 title))
What is * used for in the feature values?
Answer: The tradition
followed in this grammar is that fixed "atomic" values that are
not names are explicitly distinguished by prefixing them with a "*".
For example, *query and *count are atomic values. This is just a
convention that we followed. Names, years, and "root" features are
open-class and are stored as strings. Unfortunately, there is a bit
of inconsistency, which may be causing some confusion: the "GENRE"
and "MEDIUM" features could have been represented either way (they
are closed-class), but we chose to represent them as strings rather
than atoms. The "wh-word" roots on the otherhand are represented as
atoms rather than strings (i.e. *who rather than "who"). There are
explicit grammar rules for parsing them (they are not listed in the
lexicon).
The verb 'to broadcast', which is in the lexicon, has both
a regular and irregular form. Does our grammar need to be able to parse
both forms?
Answer: The test sentences will only use the irregular form of broadcast,
as in the example parses. That is, only sentences like
The show was broadcast... The show was broadcasted...need to be parsed, and usage like
They broadcast the show yesterday. They broadcasted the show...will not be in the test set.
Can we see the test questions at least briefly before the assignment is
due?
Answer: No. :)
How can we tell the difference between a name and a date?
Answer: (Alon Lavie) The provided "get-name" function already correctly handles
dates/numbers in the input, but does not distinguish between dates and "real"
name words. You can distinguish real names by the fact that names in the input
should be capitalized, and thus rules that look for names should require the
@CAP to be present before the name. There would be a problem if the movie name
is itself a year/number (i.e 2001), such as in: "who starred in 2001", which
is a real ambiguity, but as a "name" would not generate a @CAP in the input.
(There won't be such questions in the unseen test set...)
Is there a way to put a NOT constraint on values?
Answer: Yes. It's not needed to do the homework, but might help simplify some things depending on your approach. For example:
((f0 tense) =c (*NOT* *past))
Or with multiple arguments in a list:
((f0 pronoun) =c (*NOT* *he *she *it *they))
How different will the test questions be? I'm worried about adding too
many possibly useless rules to my grammar.
Answer: The movies, actors, etc. will change, but the grammatical structure
will be quite similar, and the lexicon will contain no words not seen in the
training examples. Basically, your rules should be general enough to handle
re-combinations of the basic building blocks that recur in the questions. It'll
be a reasonable test.
Are terms in [] just for our understanding or should they be parsed, too?
Answer: The brackets were just for illustration (Sentences to be parsed by the
server should not include brackets, or the question mark), your parse rules
don't need to handle them.
Which is better, having no parse for some sentences, or having incorrect
as well as correct parses for some sentences?
Answer: The latter. I think it's safe to say that the grading points for having,
say 1 correct and 1 incorrect will be higher than zero (which is what you'd
get for no parse at all).
The sentences are not parsed at all. In debug mode, the parser's output looks like the following:
Error: The variable *STATE-ARRAY* is unbound.
Fast links are on: do (si::use-fast-links nil) for debugging
Error signalled by SETF.
Backtrace: system:universal-error-handler > evalhook > pstr > let >
dcp > p > progn > cond > parse-list > do > initialize >
SETF
What does it mean?
Answer: When you see the above error message, the most likely case is that the parentheses are not matched in one or more of your grammar rules. Somtimes lack of necessary parentheses or incorrect positions of parentheses may also result in incorrect parses. Check out the help file for more information on the correct format of grammar rules.
Can I use epsilon-rules such as (<A> <--> ( )) in the grammar?
Answer: The parser does not support epsilon-rules in the grammar, so they are not allowed.