Search Engines:
11-442 / 11-642 / 11-742
 
CMU logo
 

HW4: Diversification
Due Apr 1, 11:59pm

 

Assignment Overview

The purpose of this assignment is to gain experience with two algorithms that diversify search engine rankings.

 

1. New Retrieval Capabilities

This homework adds diversification reranking with xQuAD and PM-2 to the ranking pipeline that you developed in HW1-HW3. Reranking with explicit diversification algorithms such as xQuAD and PM-2 requires that your system have three new capabilities: i) generate rankings for each query intent (interpretation), ii) possibly scale the ranking scores, and iii) use xQuAD or PM-2 (Lecture 16) to rerank the top documents.

1.1. Search Engine Architecture

HW4 requires development of a new reranker that supports the following capabilities.

For each query q:

Note: Your software must support two different sources for the initial document ranking: Your HW2 system, and the file specified by diversityInitialRankingFile. This allows us to isolate the effects of the diversity algorithm from how well your HW2 system worked.

See the Design Guide for advice about how to implement these capabilities.

1.1. Scaling Document Scores

PM-2 and xQuAD can be used with any retrieval algorithm. However, both algorithms expect document scores to be probabilities in the range [0.0, 1.0]. These algorithms do not work properly when document scores are outside of this range. BM25 scores are outside of this range. Most Indri implementations in this class probably produce scores in the range [0.0, 1.0]; however Indri implementations that calculate log-based scores produce scores outside of this range.

Scores from a single ranking can be converted to probabilities by calculating the sum of document scores in the ranking, and then dividing scores by that sum of scores value. However, when diversifying the results for query query q with intents qi, all of the rankings should be normalized with the same value, so that document scores remain comparable after scaling. Thus, for query q, calculate the sum of scores value for each ranking independently, and then use the maximum value to normalize all of the rankings.

For example, given the following BM25 ranking for query q and intent scores for q1 and q2:

Rank q q1 q2
1 7.788927555 14.67540193 11.25618291
2 7.772853851 14.65528709 11.28977573
3 7.772853851 14.71282333 11.28977573
4 7.753368378 14.65376401 11.23132658
Sum 31.08800364 58.69727635 45.06706095

The maximum sum of scores is 58.69727635. Thus, divide the document scores in all three of these rankings by 58.69727635.

Only do this calculation with the top reranker_1:rerankDepth documents in each ranking. Documents below this depth are ignored by diversification, so their scores do not play a role in scaling. If you use them anyway, your scores (and thus your diversified rankings) will be a little different from ours.

For a more detailed example, see these input rankings and the corresponding scaling values.

The scaling of one query (e.g., query 151) is independent of the scaling of other queries (e.g., query 152).

1.2. Input

Your software must support all of the parameters used in previous homework, as well as the new parameters described below.

1.3. Output

Your software must write search results to a file in trec_eval input format, as it did for previous homework.

1.4. Testing Your Software

Use the HW4 Testing Page to access the trec_eval and homework testing services.

You may do local testing on your laptop, as you did for HW1-HW3. The HW4 test cases and grading files (combined into a single directory) are available for download (zip, tgz).

This assignment uses two different .qrel files.

The adhoc .qrels do not know about the different intents for each query, but the diversity .qrels do.

 

2. Experiments

After your system is developed, you must conduct experiments and analyses that investigate the effectiveness of the two diversification algorithms in different situations.

Conduct your experiments with the queries and queries intents listed below.

Pro Tip: Most of the experiments involve changes to the parameter values of the diversification algorithms. The initial rankings of the queries and intents -- the input to PM2 and xQuAD -- are identical for most of the experiments. Your experiments will run much (much) faster if you generate those rankings once, store them in files, and read those files for your experiments. Use the ranker.inRankFile:Path and reranker_<n>.diversity:inRankFile:Path parameters to specify the files that contain the initial rankings for the query and intents. Your experiments will be reproducible if you upload the files with the initial rankings and also the .param and .qry files used to generate them. Call these files HW4-Exp-inRank.* (.param, .qry, .inRank) and HW4-Exp-inRankIntent.* (.param, .qry, .inRankIntent).

2.1. Experiment: Diversification & relevance baselines (Everyone)

Conduct an experiment that examines the effects of PM2 and xQuAD on Indri and BM25. Use the following parameter values:

Use the P-IA@10, P-IA@20, and alpha-NDCG@20 diversification metrics and the P@10, P@20, P@30, and MAP relevance metrics to analyze the experimental results.

2.2. Experiment: The effect of λ on PM2 and xQuAD (11-642)

Conduct an experiment that examines the effect of the λ parameter on PM2 and xQuAD when used with BM25 and Indri. Use the same parameter values used for the first two experiments, except for λ. Test the following values for λ: 0.0, 0.33, 0.67, and 1.0.

Use the P-IA@10, P-IA@20, and alpha-NDCG@20 diversification metrics to analyze the experimental results. You may also consider the effects on relevace metrics, however it is not required.

2.3. Experiment: The effect of the re-ranking depth (11-642)

Conduct an experiment that examines the effect of the re-ranking depth parameters on Indri + PM2, BM25 + PM2, and BM25 + xQuAD. Use the same parameter values used for the first two experiments, except for the reranking depth parameter. Select four reranking depths that you think may produce interesting results.

Use the P-IA@10, P-IA@20, and alpha-NDCG@20 diversification metrics to analyze the experimental results. Your analysis may also consider the effects on relevace metrics, however it is not required.

 

3. The Report

11-442 students must submit a brief report that contains a statement of collaboration and originality, and their experimental results. A template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

11-642 and 11-742 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.

 

4. Submission Instructions

Create a .zip or .tar file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW4-Report.pdf and place it in the same zip file directory that contains your software (e.g., the directory that contains QryEval.py).

Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.

 

5. Grading

The grading requirements and advice are the same as for HW1.

 

FAQ

If you have questions not answered here, see the Frequently Asked Questions file. If your question is not answered there, please contact the TA or the instructor.


Copyright 2024, Carnegie Mellon University.
Updated on March 25, 2024

Jamie Callan