Search Engines:
11-442 / 11-642
 
CMU logo
 

HW5: Diversification
Due Nov 25, 11:59pm

 

Assignment Overview

The purpose of this assignment is to gain experience with two algorithms that diversify search engine rankings.

 

1. New Retrieval Capabilities

This homework adds diversification reranking with xQuAD and PM-2 to your reranking pipeline.

1.1. Search Engine Architecture

Reranking with explicit diversification algorithms such as xQuAD and PM-2 requires a new reranker (RerankWithDiversity) that has three capabilities:

To keep things simple, the initial ranker is BM25 and the diversification reranker is reranker_1. This assignment does not use PRF, LTR, or BERT.

See the Design Guide for advice about how to implement these capabilities.

1.2. Parameters

Your software must support all of the parameters used in previous homework, as well as the new parameters described below.

1.3. Output

Your software must write search results to a file in trec_eval input format, as it did for previous homework.

1.4. Testing Your Software

Use the HW5 Testing Page to access the trec_eval and homework testing services.

You may do local testing on your laptop, as you did for earlier homework. The HW5 test cases and grading files (combined into a single directory) are available for download (zip, tgz).

This assignment uses two different .qrel files.

The adhoc .qrels do not know about the different intents for each query, but the diversity .qrels do.

 

2. Experiments

After your system is developed, you must conduct experiments and analyses that investigate the effectiveness of the two diversification algorithms in different situations.

Conduct your experiments with the queries and queries intents listed below.

You may select BM25 parameters based on your prior experience. Use trecEvalOutputLength=1000 for all of your experiments.

Use the P-IA@10, P-IA@20, and alpha-NDCG@20 diversification metrics and the MRR, P@10, P@20, NDCG@20, and MAP@1K relevance metrics to analyze the experimental results.

2.1. Experiment: Diversification & relevance baselines (Everyone)

Conduct an experiment that examines the effects of PM2 and xQuAD on BM25. Use the following diversification parameter values:

2.2. Experiment: The effect of λ on PM2 and xQuAD (11-642)

Conduct an experiment that examines the effect of the λ parameter on PM2 and xQuAD. Test the following values for λ: 0.0, 0.25, 0.5, 0.75, and 1.0

2.3. Experiment: The effect of the re-ranking depth (11-642)

Conduct an experiment that examines the effect of the re-ranking depth parameters on PM2 and xQuAD. Select five reranking depths that you think may produce interesting results. You may set λ based on your previous experiment.

 

3. The Report

11-442 students must submit a brief report that contains a statement of collaboration and originality, and their experimental results. A template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

11-642 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

 

4. Submission Instructions

Create a .zip or .tar file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW5-Report.pdf and place it in the same zip file directory that contains your software (e.g., the directory that contains QryEval.py).

Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.

The Homework Services web page provides information about your homework submissions and access to graded homework reports.

 

5. Grading

The grading requirements and advice are the same as for HW1.

 

FAQ

If you have questions not answered here, see the Frequently Asked Questions file. If your question is not answered there, please contact the TA or the instructor.


Copyright 2024, Carnegie Mellon University.
Updated on November 12, 2024

Jamie Callan