Search Engines:
11-442 / 11-642
 
CMU logo
 

HW3: Learning to Rank
Due Oct 28, 11:59pm

 

Assignment Overview

The purpose of this assignment is to gain experience with using machine learning algorithms to train feature-based retrieval models.

 

1. New Retrieval Capabilities

Ranking with a feature-based learning to rank architecture requires that your system have three new capabilities: i) features that indicate how well a document satisfies a query, ii) the ability to train a model that combines feature values into a ranking score, and ii) the ability to use the model to rank documents for new queries.

1.1. Search Engine Architecture

The QryEval search engine supports a reranking architecture and provides an initial, unfinished Reranker class. HW3 requires you to develop a learning-to-rank (LTR) reranker. HW4 requires you to develop a neural reranker. Capabilities that you expect to be general should be implemented in the Reranker class. Capabilities that you expect to be specific to a particular algorithm should be implemented in a class for that algorithm (e.g., RerankWithLtr).

The learning-to-rank reranker needs to support the following capabilities.

Below are three optional updates to the QryEval softare that may help guide development of the learning-to-rank software. You are not required to use these updates (they do not change how your software is tested).

  1. Reranker.py: We recommend changing the names of the LTR and BERT (HW4) rerankers to make it more clear that there is no inheritance relationship with them. This involves two changes to your Reranker.py software:

  2. Reranker.py: Delete "Indri" from comments (2x), because it is not used.

  3. RerankWithLtr.py: An outline to guide the development of your LTR reranker.

See the Design Guide for advice about how to implement these capabilities.

1.2 Features

Your program must implement the following features.

Use what you have learned in the course so far to guide your development of custom features. Feature quality is part of your grade (all students), and points will be deducted for features that are trivial (e.g., idf 2) or show a lack of understanding.

You have considerable discretion about what features you develop, but there needs to be a good reason why your feature might be expected to make a difference. Your features are hypotheses about what information improves search accuracy. There are many options, including the vector space model; the distance among query terms in the document; and some clever treatment of inlink text. Use your imagination. It is not necessary that your features actually improve accuracy (although we hope that they will). Your hypotheses are important, not the success of your hypotheses.

Note: Query-only features don't make much of a difference with these learning algorithms, thus won't receive credit.

Note to undergraduates: Feature quality is part of your grade, too.

1.3 Machine Learning Toolkits

This assignment uses two machine learning toolkits that have similar capabilities.

  1. SVMrank consists of two C++ software applications: svm_rank_learn and svm_rank_classify. Binaries for Mac, Linux, and Windows are provided, or you may compile your own version using the source code. Our Linux binary requires a recent version of gcc. The .param file indicates where these executables are stored.

  2. RankLib provides several pairwise and listwise algorithms for training models. The Ranklib .jar file is included in the lucene-8.1.1 directory that you downloaded for HW1.

These toolkits read data from files (e.g., to train a model, to calculate document scores) and write data to files (e.g., the new document scores). Both algorithms use the same file formats.

1.4 Parameters

Your software must support all of the parameters used in previous homework, as well as the new parameters described below.

1.5 Relevance Assessments

The ltr:trainingQrelsFile parameter identifies a file of relevance assessments. Use this <qid, docid, label> data to generate training data. The relevance assessments were produced by different years of the TREC conference. Some queries are evaluated on a two-point scale.

0: not relevant
1: relevant

Some queries are evaluated on a five-point scale.

-2: spam (not relevant) - Treat this label as 0
0: not relevant
1: relevant
2: highly relevant
3: key (page or site is comprehensive and should be a top search result)
4: nav (page is a navigational result for the query; query meant "go here")

Your software should handle all of these labels.

1.6 Output

Your software must write search results to a file in trec_eval input format, as it did for previous homework. It must also the write training and testing feature vectors to files, as described above.

1.7 Testing Your Software

Use the HW3 Testing Page to access the trec_eval and homework testing services.

You may do local testing on your laptop, as you did for HW1 and HW2. The HW3 test cases and grading files (combined into a single directory) are available for download (zip, tgz). HW2 uses the same .qrel file used for HW2.

 

2. Experiments (11-442 Students)

You must conduct an experiment to demonstrate the effectiveness of your custom features. You will use the set of training queries (all of them) to train a model that can be used to re-rank documents for any query. (HW3-train.qry, HW3-train.qrel)

Use BM25 to generate an initial ranking for each test query, and then use the trained model to re-rank the top n documents.

Conduct experiments that examine the effects of each of your custom features. There will be six experiments with each learning algorithm.

Use a reranking depth of 100.

Do your experiments with the HW3-Exp queries.

 

3. Experiments (11-642 Students)

You must conduct experiments and an analysis that investigate the effectiveness of your custom features and the learning-to-rank approach in different situations.

In each experiment, you will use the set of training queries (all of them) to train a model that can be used to re-rank documents for any query. (HW3-train.qry, HW3-train.qrel)

Use a reranking depth of 100.

Use BM25 to generate an initial ranking for each test query, and then usethe trained model to re-rank the top n documents.

3.1 Learning to Rank Baselines

Use your existing BM25 and RankedBoolean implementations to produce two baseline document rankings for the HW3-Exp queries.

Also use your learning-to-rank software to train models that use different types of features.

The learning-to-rank Experiments are conducted with all three LTR algorithms, to enable you to see whether different algorithms have similar behavior with each type of feature.

Test your models on the HW3-Exp queries. Discuss the trends that you observe; whether the learned retrieval models behaved as you expected; how the learned retrieval models compare to the baseline methods and the full feature set; and any other observations that you may have.

3.2 Custom Features

Conduct experiments that examine the effects of each of your custom features. There will be six experiments with each learning algorithm.

Test your models on the HW3-Exp queries. Discuss the trends that you observe, focusing on the contribution of your custom features to LTR Base features for each learning algorithm.

3.3 Feature Combinations

Experiment with four different combinations of features: Try to find a small set of features that delivers accurate results. We do not expect you to investigate all combinations of features. Your goal is to investigate the effectiveness of different groups of features, and to discard any that do not improve accuracy.

You may use whichever of the three learning algorithms you choose based on your prior experiments.

Discuss the trends that you observe; whether the learned retrieval models behaved as you expected; how the learned retrieval models compare to the baseline methods and the full feature set; and any other observations that you may have.

 

4. The Report

11-442 students must submit a brief report that contains a statement of collaboration and originality and describes your custom features. A template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

11-642 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

 

5. Submission Instructions

Create a .zip file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW3-Report.pdf and place it in the same zip file directory that contains your software (e.g., the directory that contains QryEval.java).

Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.

The Homework Services web page provides information about your homework submissions and access to graded homework reports.

 

6. Grading

The grading requirements and advice are the same as for HW1.

 

FAQ

If you have questions not answered here, see the HW3 FAQ and the Homework Testing FAQ.


Copyright 2024, Carnegie Mellon University.
Updated on October 21, 2024

Jamie Callan