Learning To Rank Resources

Language Technologies Institute

Carnegie Mellon University

Language Technologies Institute

Carnegie Mellon University

Language Technologies Institute

Carnegie Mellon University

Abstract

We present a learning-to-rank approach for resource selection. We develop features for resource ranking and present a training approach that does not require human judgments. Our method is well-suited to environments with a large number of resources such as selective search, is an improvement over the state-of-the-art in resource selection for selective search, and is statistically equivalent to exhaustive search even for recall-oriented metrics such as MAP@1000, an area in which selective search was lacking.

Datasets

Shard Ranking Lists for ClueWeb09-B, TREC 2009-2012 Web Track queries

Shard Ranking Lists for Gov2, TREC 2004-2006 Tyrabyte Track queries

Shard partitions were from:

Shard Parition for ClueWeb09-B

Shard Partition for Gov2

Ranking list file is formated as {trec, mqt, aol}_{full, fast}.sharlist:

trec: L2R-TREC method in the paper. Model trained with trec relevance judgments, cross-validation.
mqt: L2R-MQT method in the paper. Model trained with 1000 Million Query Track queries, using overlap-based labels.
aol: L2R-AOL method in the paper. Model trained with 1000 AOL queries, using overlap-based labels.

full: Model with all features.
fast: Model with the FAST feature set.

In the ranking list file, each line is formated in QID shardID1 shardID2 shardID3...

QID: query id from TREC
shardID1 shardID2...: the shard ranking for the query.

Acknowledgements

This research was supported by National Science Foundation (NSF) grant IIS-1302206. Yubin Kim is the recipient of the Natural Sciences and Engineering Research Council of Canada PGS-D3 (438411). Any opinions, findings, and conclusions in this paper are the authors' and do not necessarily reflect those of the sponsors.

Download Paper PDF

Updated on August 15, 2017

Zhuyun Dai