Dynamic Shard Cutoff Prediction for Selective Search

Language Technologies Institute

Carnegie Mellon University

Petuum Inc.

Language Technologies Institute

Carnegie Mellon University

RMIT University

Abstract

Selective search architectures use resource selection algorithms such as Rank-S or Taily to rank index shards and determine how many to search for a given query. Most prior research evaluated solutions by their ability to improve efficiency without significantly reducing early-precision metrics such as P@5 and NDCG@10.

This paper recasts selective search as an early stage of a multistage retrieval architecture, which makes recall-oriented metrics more appropriate. A new algorithm is presented that predicts the number of shards that must be searched for a given query in order to meet recall-oriented goals. Decoupling shard ranking from deciding how many shards to search clarifies efficiency vs. effectiveness trade-offs, and enables them to be optimized independently. Experiments on two corpora demonstrate the value of this approach.

Datasets

Shard Reference Ranking Lists for ClueWeb09-B, TREC 2009 Million Query Track queries (training labels)

Shard Reference Ranking Lists for ClueWeb09-B, TREC 2009-2012 Web Track queries (test labels)

Shard Reference Ranking Lists for Gov2, TREC 2007-2008 Million Query Track queries (training labels)

Shard Reference Ranking Lists for Gov2, TREC 2004-2005 Terabyte Track queries (test labels)

Shard partitions were from:

Shard Parition for ClueWeb09-B

Shard Partition for Gov2

In the shard ranking lists, each line is formated in QID shardID1 shardID2 shardID3... :: K_precision, K_recall

QID: query id from TREC
shardID1 shardID2...: the reference shard ranking for the query.
K_precision: the number of shards for this QID for precision
K_recall: the number of shards for this QID for recall

Acknowledgements

This research was supported by National Science Foundation (NSF) grant IIS-1302206 and the Australian Research Council's Discovery Projects Scheme (DP170102231). Any opinions, findings, and conclusions in this paper are the authors' and do not necessarily reflect those of the sponsors.

Download Paper PDF

Updated on September 21, 2018

Hafeezul Rahman Mohammad