The purpose of this assignment is to learn about Boolean retrieval and evaluation of retrieval results. This assignment consists of several parts.
A password is required to access some course resources. You may obtain a password here. Use your Andrew email address as your userid.
The homework assignments are done in Python using an Anaconda development environment with a standard set of libraries and tools. If you do not already have Anaconda on your computer, download and install it.
Next, define an Anaconda environment that contains the same libraries and tools that will be used to test and grade your software.
conda env create -f 11x42-25S-a.yml
Initialize the software development directory. This can be done in one of two ways.
The simplest method is to download init_dir.zip or init_dir.tgz (4.4G compressed, 7.0G uncompressed). It creates and populates the following directory structure.
Or, you can create it manually, using the files in the Source column below.
Directory | Contents | Source | |||
---|---|---|---|---|---|
QryEval/ | QryEval software | .zip or .tgz | |||
INPUT_DIR/ | Data files, other software | Document index (.zip or
.tgz) Relevance judgments (.qrel) | |||
LIB_DIR/ | Java jar files | .zip or .tgz | |||
OUTPUT_DIR/ | Output from your software | Initially empty | |||
TEST_DIR/ | Public ("training") test cases | .zip or .tgz |
The QryEval software, the homework parameter files, and the homework testing system expect this directory structure. When your software is tested, the current working directory will be the QryEval directory.
Finally, download the trec_eval executable, and put it in the QryEval/INPUT_DIR directory. There are different versions for Mac (FAQ); Linux; and Windows
At this point, you should be able to run one of the HW1 public ("training") cases.
conda activate 11x42-25S-a
|
The output should be the following.
-- Ranker: UnrankedBoolean --
|
There should also be a file HW1-Train-0.teIn in the OUTPUT_DIR directory.
This assignment adds several new capabilities to the QryEval search engine.
Your search engine must parse structured or unstructured queries, and use a document-at-a-time architecture (as discussed in Lectures 1, 2, and 3) to rank documents for each query. Search results are written to a file in trec_eval format. The sample QryEval software (QryEval-4.4.zip or QryEval-4.4.tgz) provides a working example.
Documents are stored in a Lucene index. Lucene is a widely-used open-source search engine, both on its own, and as the foundation of ElasticSearch, Solr, and Pyserini. Lucene is written in Java, so your search engine is a combination of Python and Java software.
The example software includes a set of Python classes that make it easier to develop a document-at-a-time architecture and access Lucene indexes. See the documentation for details.
Your search engine must support the following capabilities.
Retrieval models:
Query operators:
Unstructured queries must use a default query operator. #AND
is the default operator for Unranked and Ranked Boolean retrieval
models. #SUM is the default operator for the BM25 retrieval model.
Note: After you implement the #AND operator, you will need
to change the default operator for the UnrankedBoolean retrieval
model from #OR to #AND.
Document fields: Your software must support the five fields provided by the index. (Provided in QryEval.)
The field name (if any) is specified in the query using a suffix-based syntax of the form 'term.field', as in 'apple.title'. If the query does not specify a field (e.g., if the user types 'apple'), your software should default to using the 'body' field.
See the Design Guide for advice about how to implement these capabilities.
Your search engine must accept a single command-line argument, which is a path to a parameter file in json format. The parameter file configures your software for testing.
Your software must support the following parameters.
In HW1, there are two output length parameters. ranker.outputLength sets the maximum number of documents returned by the ranker. trecEvalOutputLength sets the maximum number of documents written to the trec_eval file. These may seem redundant. The need for different parameters is more apparent in HW3-HW5, when the ranking pipeline becomes longer.
See the HW1 Testing page for example parameter files.
The search engine writes rankings to the trecEvalOutputPath file in a format that enables trec_eval to produce evaluation reports. It already does this correctly. The information below is provided just to help you understand the format and contents.
Matching documents must be sorted by their scores (primary key, descending order) and their external document ids (secondary key, alphabetic order; used for tie scores).
When a query retrieves no documents, write a line with a non-existent document id (e.g., 'dummy' or 'Nonexistent_Docid') and a score of 0.
The last field of each line is a run identifier. It may be anything. You may find this convenient for labeling different experiments.
There are three ways to test your software.
Download the test cases (zip
or tgz) and evaluation software to your
computer. Run the tests on your computer and check results
locally. This is the fastest option.
Mac,
Linux and
Windows
executables are available.
You can also download the
trec_eval
software from Github and compile your own. Run the command as
shown below.
trec_eval-9.0.4 -m num_q -m num_ret -m num_rel -m num_rel_ret -m map -m recip_rank -m P -m ndcg -m ndcg_cut -m recall.100,500,1000 cw09a.adhoc.1-200.qrel.indexed <test case>.teIn
Store the output in a .teOut file.
Download the test cases (zip or tgz) to your computer. Run the tests on your computer, save the results to a file in trec_eval format, and upload the file to the trec_eval web service.
Package your source code in a .zip file, and upload the file to the homework testing service. This is the slowest option, but we use this service to assess your software when we grade your homework, so we strongly suggest that you use it at least a couple of times before making your submission for grading.
These web services are accessed via the HW1 testing web page.
After your system is developed, you must conduct experiments that investigate the effectiveness of the different retrieval models and document representations.
Conduct your experiments with this bow query set. Some experiments require you to transform the bag-of-words queries into structured queries.
The first experiment investigates the use of default query structure and other types of structure in Boolean queries. Compare retrieval accuracy using three query sets.
Test each query set (BOW with AND, BOW with NEAR/3, Structured) with each retrieval algorithm (unranked Boolean, ranked Boolean), thus there will be 6 sets of experimental results.
Use trecEvalOutputLength=1000 for the experiments.
For each test (each query set), report the following information:
Undergraduates: Your goal is to get familiar with the two ranking algorithms and to get a little hands-on experience with forming structured queries and running reproducible experiments. Grading is based primarily on whether your queries demonstrate an understanding of the use of different query operators and document fields.
Graduate students: You have the same goals as the undergraduates, but you should also try to develop general strategies for forming queries that beat the unstructured baselines. Don't obsess over achieving high accuracy. Grading is based on the the quality of your strategies, not the accuracy of your results. Some interesting hypotheses may not work. Generate-and-test works well, but is a poor strategy unless one has infinite time.
The documents in your index have five representations: body, title, url (terms extracted from the url), keywords (terms supplied by the document author), and inlink (terms used in hyperlinks from other documents to this document). The second experiment examines the value of these different fields using the Ranked Boolean and BM25 retrieval models. Test each representation individually using the bag-of-words queries. Thus there will be 10 sets of experimental results (2 retrieval models × 5 document representations).
BM25 allows evidence from different representations of the document content (e.g., body, title, ...) to be combined to create a better estimate of how well a concept (e.g., a query term) or a query matches the document. For example, the unstructured query "The Time Traveler's Wife" can be transformed into the following structured query.
#WSUM ( 0.2 #SUM(time.url traveler.url wife.url) 0.3 #SUM(time.title traveler.title wife.title) 0.5 #SUM(time.body traveler.body wife.body))
Transform the bow query set into a structured query that uses #WSUM to combine evidence from different document representations (fields). Test four weight combinations for using the different document representations (i.e., create four structured query sets). Note: In one structured query set, the same weights are used for all queries (i.e., do not form query-specific weights).
Each field (body, title, url, keywords, inlink) must be used (must have a non-zero weight) in at least one experiment.
Reproducibility: Your submission must include files in the QryEval directory that enable all of your experiments to be reproduced. They must follow a naming convention so that the homework testing service can find them. The naming convention is approximately <hwid>-Exp-<experiment id>.<filetype>, for example, HW1-Exp-3.1a.qry and HW1-Exp-3.1a.param. See the report template (below) for guidance about how to name the files for each experiment.
Parameter files: Pro tip: You must write parameter files for your experiments, which can be tedious. You may find it helpful to write a script to generate .param files automatically, e.g., CSV file (each column is a parameter, each row is an experiment) → set of .param files. Create a Python structure for each row, convert it to json, and write to a file. This is entirely optional.
11-442 students must submit a brief report that contains a statement of collaboration and originality, their queries, and their experimental results. A template is provided in Microsoft Word and pdf formats.
11-642 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.
See the grading information document for information about how experiments and reports are graded.
Create a .zip file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW1-Report.pdf and place it in the same directory that contains your software (e.g., the directory that contains QryEval.py).
Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.
Your report is uploaded to software that automatically parses it into sections to improve grading speed. Parsing fails if your report does not have the expected structure. A few points will be deducted if your report does not follow the structure of the report template.
Undergraduates: 88% autograding of your software, 12% for the queries and experimental results.
Graduate students: 50% autograding of your software, 50% for the queries, experimental results, and analysis of experimental results. See the grading information document for information about how experiments and reports are graded.
If you have questions not answered here, see the HW1 FAQ and the Homework Testing FAQ.
Copyright 2025, Carnegie Mellon University.
Updated on January 20, 2025