HW1: Lexical Retrieval
Due February 2, 11:59pm ET
Assignment Overview
The purpose of this assignment is to learn about Boolean retrieval and evaluation of retrieval results. This assignment consists of several parts.
- Set up an Anaconda software development environment with tools that you will need for several homework assignments.
- Add new retrieval capabilities to the QryEval search engine.
- Conduct experiments with your search engine.
- Write a report about your work.
- Upload your work to the course website for grading.
Note: A password is required to access some course resources. You may obtain a password here. Use your Andrew email address as your userid.
1. New Capabilities
You are provided with a simple search engine (QryEval) that supports a few query operators and the Unranked Boolean retrieval model. It expects a single command line parameter, which is a path to a .param file that configures the search engine to conduct an experiment.
This assignment adds new configuration parameters, new query operators, and new retrieval models. It implements capabilities discussed in Lectures 1, 2, and 3
See the Design Guide for detailed instructions about how to implement these capabilities.
2. Experiments
After your system is developed, you must conduct experiments that investigate the effectiveness of the different retrieval models and document representations.
Your experiments will be conducted with short and long descriptions of the same information needs. Short, terse queries were common in web search until a few years ago. Recently, longer, more grammatical queries have become more common. Your experiments will investigate different methods of transforming natural language into queries for different types of retrieval models.
2.1. Experiment: Structure in Boolean Queries
The first experiment compares retrieval accuracy using three methods of transforming short and long natural language queries to structured queries for Boolean retrieval methods.
- Automatic with #AND: Use a single #AND operator to transform natural language into structured queries.
- Automatic with #NEAR/3: Use a single #NEAR/3 operator to transform natural language into structured queries.
- Manual: Manually create structured queries for
Boolean retrieval models. Your goal is to improve over
the baseline #AND and #NEAR/3 queries. You may decide how and
where to apply the query operators to improve results, but:
- Every query must contain some form of structure;
- Every query operator (AND, OR, NEAR, SYN, WINDOW) must be used in at least two queries; and
- Every field (title, body, url, keywords, inlink) must be used in at least two queries.
- FAQ:
- It isn't necessary to use all operators or fields in the same query.
- You are not restricted to terms in the original query. You may add synonyms or other vocabulary if you wish.
- You may delete terms that you think are not helpful.
There are two types of natural language queries (short, long) × three methods of forming queries (#AND, #NEAR/3, manual) × two retrieval methods (Unranked Boolean, Ranked Boolean), for a total of 12 experimental results.
Use trecEvalOutputLength=1000 for the experiments.
For each test (each query set), report the following information:
- MRR;
- Precision at ranks 10, 20, and 30;
- MAP@1K;
- NDCG at ranks 10, 20, and 30;
- Recall at ranks 100 and 1000;
- running time (minutes:seconds).
These metrics address accuracy at different points in the ranking; how well a method finds relevant documents, even if it does not rank them well; and a method's computational cost. As a group, they provide a variety of ways to investigate the behavior and effectiveness of different approaches to search.
Undergraduates: Your goal is to get familiar with the two ranking algorithms and to get a little hands-on experience with forming structured queries and running reproducible experiments. Grading is based primarily on whether your queries demonstrate an understanding of the use of different query operators and document fields.
Graduate students: You have the same goals as the undergraduates, but you should also try to develop general strategies for forming queries that beat the unstructured baselines. Don't obsess over achieving high accuracy. Grading is based on the the quality of your strategies, not the accuracy of your results. Some interesting hypotheses may not work. Generate-and-test works well, but is a poor strategy unless one has infinite time.
2.2. Experiment: BM25 Ranking
The documents in your index have five representations: body, title, url (terms extracted from the url), keywords (terms supplied by the document author), and inlink (terms used in hyperlinks from other documents to this document). The second experiment compares the value of the body and title representations for Ranked Boolean and BM25.
The second experiment also examines the value of the Sequential Dependency Model (SDM) approach to automatically forming structured queries for BM25.
- body vs title: Form queries for each retrieval model (Ranked Boolean, BM25) using either the body field or the title field (e.g., "apple.title"). Use each retrieval method's default query operator (i.e., #AND for Ranked Boolean, #SUM for BM25).
- SDM: For BM25, use the SDM to automatically form
queries. Use weights of (0.7, 0.2, 0.1) for the three elements of
SDM queries. For example:
apple pie recipe --> #WSUM( 0.7 #SUM( apple pie recipe ) 0.2 #SUM( #NEAR/1( apple pie ) #NEAR/1( pie recipe )) 0.1 #SUM( #WINDOW/8( apple pie ) #WINDOW/8( pie recipe )))
There are two types of natural language queries (short, long) × two fields (body, title) × two retrieval methods (Ranked Boolean, BM25) plus SDM queries for two types of natural language queries (short, long), for a total of 10 experimental results. You may reuse Ranked Boolean body field results from Experiment 1 (Exp-1.2a, Exp-1.2d) instead of running those again.
2.3. Experiment: Multifield BM25
BM25 allows evidence from different representations of the document content (e.g., body, title, ...) to be combined to create a better estimate of how well a concept (e.g., a query term) or a query matches the document. For example, the unstructured query "The Time Traveler's Wife" can be transformed into the following structured query.
#WSUM (
0.2 #SUM(time.url traveler.url wife.url)
0.3 #SUM(time.title traveler.title wife.title)
0.5 #SUM(time.body traveler.body wife.body))
Transform the short and long queries into structured queries that uses #WSUM to combine evidence from different document representations (fields). Test four weight combinations for using the different document representations (i.e., create four structured query sets for short and long queries).
Note: In one structured query set, the same weights are used for all queries (i.e., do not form query-specific weights). Use the same weighting strategies for short and long queries.
Each field (body, title, url, keywords, inlink) must be used (must have a non-zero weight) in at least one experiment.
3. The Report
11-442 students must submit a brief report that contains a statement of collaboration and originality, their queries, and their experimental results. A template is provided in Microsoft Word and pdf formats.
11-642 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.
See the grading information document for information about how experiments and reports are graded.
4. Submission Instructions
Create a .zip file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW1-Report.pdf and place it in the same directory that contains your software (e.g., the directory that contains QryEval.py).
Reproducibility: Your submission must include files in the QryEval directory that enable all of your experiments to be reproduced. They must follow a naming convention so that the homework testing service can find them. The naming convention is approximately <hwid>-Exp-<experiment id>.<filetype>, for example, HW1-Exp-3.1a.qry and HW1-Exp-3.1a.param. See the report template (above) for guidance about how to name the files for each experiment.
Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.
5. Grading
Your report is uploaded to software that automatically parses it into sections to improve grading speed. Parsing fails if your report does not have the expected structure. A few points will be deducted if your report does not follow the structure of the report template.
Undergraduates: 88% autograding of your software, 12% for the queries and experimental results.
Graduate students: 50% autograding of your software, 50% for the queries, experimental results, and analysis of experimental results. See the grading information document for information about how experiments and reports are graded.
FAQ
If you have questions not answered here, see the HW1 FAQ and the Homework Testing FAQ.