Language Technologies Institute Carnegie Mellon University |
Language Technologies Institute Carnegie Mellon University |
Language Technologies Institute Carnegie Mellon University |
We present a learning-to-rank approach for resource selection. We develop features for resource ranking and present a training approach that does not require human judgments. Our method is well-suited to environments with a large number of resources such as selective search, is an improvement over the state-of-the-art in resource selection for selective search, and is statistically equivalent to exhaustive search even for recall-oriented metrics such as MAP@1000, an area in which selective search was lacking.
Shard Ranking Lists for ClueWeb09-B, TREC 2009-2012 Web Track queries
Shard Ranking Lists for Gov2, TREC 2004-2006 Tyrabyte Track queries
Shard partitions were from:
Shard Parition for ClueWeb09-B
Ranking list file is formated as {trec, mqt, aol}_{full, fast}.sharlist:
trec: L2R-TREC method in the paper. Model trained with trec relevance judgments, cross-validation.
mqt: L2R-MQT method in the paper. Model trained with 1000 Million Query Track queries, using overlap-based labels.
aol: L2R-AOL method in the paper. Model trained with 1000 AOL queries, using overlap-based labels.
full: Model with all features.
fast: Model with the FAST feature set.
In the ranking list file, each line is formated in QID shardID1 shardID2 shardID3...
QID: query id from TREC
shardID1 shardID2...: the shard ranking for the query.
This research was supported by National Science Foundation (NSF) grant IIS-1302206. Yubin Kim is the recipient of the Natural Sciences and Engineering Research Council of Canada PGS-D3 (438411). Any opinions, findings, and conclusions in this paper are the authors' and do not necessarily reflect those of the sponsors.
Updated on August 15, 2017