The purpose of this assignment is to learn about Boolean retrieval and evaluation of retrieval results. This assignment consists of several parts.
A password is required to access some course resources. You may obtain a password here. Use your Andrew email address as your userid.
The homework assignments are done in Python using an Anaconda development environment with a standard set of libraries and tools. If you do not already have Anaconda on your computer, download and install it.
Next, define an Anaconda environment that contains the same libraries and tools that will be used to test and grade your software.
The QryEval software and the HW1-Train-xxx.param parameter files assume the following directory structure.
Directory | Contents | Source | |||
---|---|---|---|---|---|
QryEval/ | QryEval 4.2 and the subdirectories below | You create | |||
GRADING_DIR/ | Results from the reference system | Download zip or tgz | |||
INPUT_DIR/ | Data files, other software | Download index (see below) Download trec_eval Mac (FAQ); Linux; Windows) Download .qrel file | |||
LIB_DIR/ | Java jar files | Download zip or tgz | |||
OUTPUT_DIR/ | Output from your software | You create (initially empty) | |||
TEST_DIR/ | Public ("training") test cases | Download zip or tgz |
Note: Most .zip or .tgz files unpack into a directory that contains a set of files. Unzip it into the QryEval directory, not a subdirectory. For example, QryEval/TEST_DIR/TEST_DIR/HW1-Train-0.param or QryEval/QryEval/QryEval.py is a mistake. Correct configurations do not have two directories with the same name in a path.
This assignment creates a search engine called QryEval that implements Unranked and Ranked Boolean retrieval with several query operators. You may extend our initial QryEval search engine (less work) or develop your own from scratch (more work).
Your search engine must parse structured or unstructured queries, and use a document-at-a-time architecture (as discussed in Lectures 2, 3, and 4) to rank documents for each query. Search results are written to a file in trec_eval format. The sample QryEval software (QryEval-4.2.1.zip or QryEval-4.2.1.tgz) provides a working example.
Documents are stored in a Lucene index. Lucene is a widely-used open-source search engine, both on its own, and as the foundation of ElasticSearch, Solr, and Pyserini. Lucene is written in Java, so your search engine is a combination of Python and Java software.
The example software includes a set of Python classes that make it easier to develop a document-at-a-time architecture and access Lucene indexes. See the documentation for details.
Your search engine must support the following capabilities.
Ranking algorithms:
Query operators:
Unstructured queries must use a default query operator. #AND
is the default operator for Unranked and Ranked Boolean retrieval
models.
Note: After you implement the #AND operator, you will need
to change the default operator for the UnrankedBoolean retrieval
model from #OR to #AND.
Document fields: Your software must support the four fields provided by the index. (Provided in QryEval.)
The field name (if any) is specified in the query using a suffix-based syntax of the form 'term.field', as in 'apple.title'. If the query does not specify a field (e.g., if the user types 'apple'), your software should default to using the 'body' field.
See the Design Guide for advice about how to implement these capabilities.
Your search engine must accept a single command-line argument, which is a path to a parameter file in json format. The parameter file configures your software for testing.
Your software must support the following parameters.
In HW1, there are two output length parameters. ranker.outputLength sets the maximum number of documents returned by the ranker. trecEvalOutputLength sets the maximum number of documents written to the trec_eval file. These may seem redundant. The need for different parameters is more apparent in HW3-HW5, when the ranking pipeline becomes longer.
An example parameter file is provided. When we test your software, we will use parameter files in the same format. You will need to write your own parameter files to do your experiments.
The search engine writes rankings to the trecEvalOutputPath file in a format that enables trec_eval to produce evaluation reports. It already does this correctly. The information below is provided just to help you understand the format and contents.
Matching documents must be sorted by their scores (primary key, descending order) and their external document ids (secondary key, alphabetic order; used for tie scores).
When a query retrieves no documents, write a line with a non-existent document id (e.g., 'dummy' or 'Nonexistent_Docid') and a score of 0.
The last field of each line is a run identifier. It may be anything. You may find this convenient for labeling different experiments.
There are three ways to test your software.
Download the test cases (zip
or tgz) and evaluation software to your
computer. Run the tests on your computer and check results
locally. This is the fastest option.
Mac,
Linux and
Windows
executables are available.
You can also download the
trec_eval
software from Github and compile your own. Run the command as
shown below.
trec_eval-9.0.4 -m num_q -m num_ret -m num_rel -m num_rel_ret -m map -m recip_rank -m P -m ndcg -m ndcg_cut cw09a.adhoc.1-200.qrel.indexed <test case>.teIn
Store the output in a .teOut file.
Download the test cases (zip or tgz) to your computer. Run the tests on your computer, save the results to a file in trec_eval format, and upload the file to the trec_eval web service.
Package your source code in a .zip file, and upload the file to the homework testing service. This is the slowest option, but we use this service to assess your software when we grade your homework, so we strongly suggest that you use it at least a couple of times before making your submission for grading.
These web services are accessed via the HW1 testing web page.
The corpus is 552,682 documents from the ClueWeb09 dataset. Documents are stored in a Lucene index. The index is provided in two formats. Use whichever you are more comfortable with.
The index is large. Download it as soon as possible.
You must do experiments with three query sets.
Undergraduates: Your goal is to get familiar with the two ranking algorithms and to get a little hands-on experience with forming structured queries and running reproducible experiments. Grading is based primarily on whether your queries demonstrate an understanding of the use of different query operators and document fields.
Graduate students: You have the same goals as the undergraduates, but you should also try to develop general strategies for forming queries that beat the unstructured baselines. Don't obsess over achieving high accuracy. Grading is based on the the quality of your strategies, not the accuracy of your results. Some interesting hypotheses may not work. Generate-and-test works well, but is a poor strategy unless one has infinite time.
Use trecEvalOutputLength=1000 for your experiments.
You must test each query set (BOW with AND, BOW with NEAR/3, Structured) with each retrieval algorithm (unranked Boolean, ranked Boolean), thus you will have 6 sets of experimental results.
For each test (each query set) you must report the following information:
Reproducibility: Your submission must include files in the QryEval directory that enable all of your experiments to be reproduced. They must follow a naming convention so that the homework testing service can find them. The naming convention is approximately <hwid>-Exp-<experiment id>.<filetype>, for example, HW1-Exp-3a.qry and HW1-Exp-3a.param. See the report template (below) for guidance about how to name the files for each experiment.
11-442 students must submit a brief report that contains a statement of collaboration and originality, their queries, and their experimental results. A template is provided in Microsoft Word and pdf formats.
11-642 and 11-742 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.
See the grading information document for information about how experiments and reports are graded.
Create a .zip file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW1-Report.pdf and place it in the same directory that contains your software (e.g., the directory that contains QryEval.py).
Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.
Your report is uploaded to software that automatically parses it into sections to improve grading speed. Parsing fails if your report does not have the expected structure. A few points will be deducted if your report does not follow the structure of the report template.
Undergraduates: 88% autograding of your software, 12% for the queries and experimental results.
Graduate students: 50% autograding of your software, 50% for the queries, experimental results, and analysis of experimental results. See the grading information document for information about how experiments and reports are graded.
If you have questions not answered here, see the HW1 FAQ and the Homework Testing FAQ.
Copyright 2024, Carnegie Mellon University.
Updated on January 31, 2024