Language Technologies Institute School of Computer Science Carnegie Mellon University |
Language Technologies Institute School of Computer Science Carnegie Mellon University |
Language Technologies Institute School of Computer Science Carnegie Mellon University |
This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. State-of-the-art learning to rank methods learn models for ad-hoc entity search. Our experiments on an entity search test collection based on DBpedia confirm that learning to rank methods are as powerful for ranking entities as for ranking documents, and establish a new state-of-the-art for accuracy on this benchmark dataset.
Our work has participated in the short paper poster session of 2016 SIGIR Conference, Pisa, Italy on July 19, 2016. [pdf]
Our work has participated in the poster session of CMU LTI Open House on February 26, 2016. [pps]
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and Wikidata and make this information available on the Web. Our research work is conducted with the DBpedia 3.7 version. We provide the processed data in our experiments as below.
The entity descriptions in TrecWeb format |
(2.5 GB compressed, 11 GB uncompressed) |
|
The grouped dbpedia RDF file in N-Triples format |
(3.7 GB compressed, 49 GB uncompressed) |
Experimental and evaluation results are provided, including the re-ranking result files of 3 baselines and 2 LeToR models. All files are presented in the Indri format:
RankSVM results |
|||||
CA results |
Our work has adopted the DBpedia query sets and qrels file organized by Krisztian Balog . Please check this link to download.
Updated
on December 20, 2016