An Empirical Study of Learning to Rank for Entity Search  
 

Jing Chen

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

Chenyan Xiong

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

Jamie Callan

Language Technologies Institute

School of Computer Science

Carnegie Mellon University


 
 
 
 
 
 
 

 

Abstract

This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. State-of-the-art learning to rank methods learn models for ad-hoc entity search. Our experiments on an entity search test collection based on DBpedia confirm that learning to rank methods are as powerful for ranking entities as for ranking documents, and establish a new state-of-the-art for accuracy on this benchmark dataset.

 

Presentations

Our work has participated in the short paper poster session of 2016 SIGIR Conference, Pisa, Italy on July 19, 2016. [pdf]

Our work has participated in the poster session of CMU LTI Open House on February 26, 2016. [pps]

 

Datasets

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and Wikidata and make this information available on the Web. Our research work is conducted with the DBpedia 3.7 version. We provide the processed data in our experiments as below.

dbpedia.trecweb.zip

The entity descriptions in TrecWeb format

(2.5 GB compressed, 11 GB uncompressed)

dbpedia.nt.zip

The grouped dbpedia RDF file in N-Triples format

(3.7 GB compressed, 49 GB uncompressed)

 

Experimental and evaluation results are provided, including the re-ranking result files of 3 baselines and 2 LeToR models. All files are presented in the Indri format:

RankSVM results

SemSearch ES

ListSearch

INEX-LD

QALD2

All Queries

CA results

SemSearch ES

ListSearch

INEX-LD

QALD2

All Queries

 

Our work has adopted the DBpedia query sets and qrels file organized by Krisztian Balog . Please check this link to download.

 

 

Updated on December 20, 2016

Jing Chen