Frequently Asked Questions
Question: The query expansion algorithm occasionally selects terms that break the query parser, for example, "www.weather.com" (because periods are used to indicate fields) and "1,2,3,4". Do I need to upgrade my query parser to handle these terms?
Answer: You may discard any terms that break your query parser.Question: If the software is using a field (e.g., title) for expansion, what should it do if the field is empty (has length 0)?
Answer: Skip the document.Question: What are the time limits?
Answer: The time limit for HW2 is 5x the time required by Jamie's program. See the HW2 testing page for the time taken by Jamie's program for different test conditions.Question: How can I look up a term's ctf efficiently?
Answer: This information is available in two ways. Choose the method that is the best fit for your use.- TermVector.totalStemFreq: This method is convenient and efficient if you are already working with the TermVector for a document.
- Idx.getTotalTermFreq: This method is more convenient if you want the frequency of the term in several fields.
Question: What is win:tie:loss?
Answer: This metric compares an experimental system with a baseline system.- Win is the number of queries that are better in the experimental system than in the baseline system.
- Loss is the number of queries that are worse in the experimental system than in the baseline system.
- Tie is the number that have similar accuracy.
Usually a margin is used to make these decisions. For example, if the absolute value of the relative difference (|(experimental - original) / original|) is less than 2%, the difference is too small to be meaningful, so it is considered a tie; otherwise, it is a win or loss.
Win:tie:loss can be computed for any metric. MAP is most common.
If the FAQ hasn't answered your question, please search the Piazza forum to see if someone has already answered your question before you ask it.