Abstract

This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. State-of-the-art learning to rank methods learn models for ad-hoc entity search. Our experiments on an entity search test collection based on DBpedia confirm that learning to rank methods are as powerful for ranking entities as for ranking documents, and establish a new state-of-the-art for accuracy on this benchmark dataset.

Presentations

Our work has participated in the short paper poster session of 2016 SIGIR Conference, Pisa, Italy on July 19, 2016. [pdf]

Our work has participated in the poster session of CMU LTI Open House on February 26, 2016. [pps]

Datasets

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and Wikidata and make this information available on the Web. Our research work is conducted with the DBpedia 3.7 version. We provide the processed data in our experiments as below.

dbpedia.trecweb.zip	The entity descriptions in TrecWeb format	(2.5 GB compressed, 11 GB uncompressed)
dbpedia.nt.zip	The grouped dbpedia RDF file in N-Triples format	(3.7 GB compressed, 49 GB uncompressed)

Experimental and evaluation results are provided, including the re-ranking result files of 3 baselines and 2 LeToR models. All files are presented in the Indri format:

RankSVM results	SemSearch ES	ListSearch	INEX-LD	QALD2	All Queries
CA results	SemSearch ES	ListSearch	INEX-LD	QALD2	All Queries

Our work has adopted the DBpedia query sets and qrels file organized by Krisztian Balog . Please check this link to download.

Updated on December 20, 2016

Jing Chen