Word-Entity Duet Representation for Document Ranking

 

Chenyan Xiong

Jamie Callan

Tie-Yan Liu

Language Technologies Institute

Language Technologies Institute

Microsoft Research

Carnegie Mellon University

Carnegie Mellon University

 

 

 

Abstract

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are generated by the interactions between the two representations, incorporating information from the word space, the entity space, and the cross-space connections through the knowledge graph.  To handle the uncertainties from the automatically constructed entity representations, an Attention-based ranking model AttR-Duet is developed. With back-propagation from ranking labels, the model learns simultaneously how to demote noisy entities and how to rank documents with the word-entity duet. Results on TREC Web Track ad-hoc task demonstrate that all of the four-way interactions in the duet are useful, the attention mechanism successfully steers the model away from noisy entities, and together they significantly outperform both word-based and entity-based learning to rank systems.

 

Dataset

The whole dataset is available as duet.zip.

The data folder includes the annotated queries and documents: q_tagme_ana.json  and cw_doc_tagme_ana.json are the TagMe annotated TREC web track queries and candidate documents used in this work. Each line in it is a json dumped dictionary of a query or document. The field names are self-explanatory.

qrel.all is the category-B query relevance file obtained from TREC official website.

 

Ranking Result includes of all baselines and our methods in ClueWeb09 and ClueWeb12.

Each sub-directory in either folder corresponds to one method. It includes evaluations results (gdeval.pl outputs) of different depths, for example: cw09/Att-LeToR-Duet/eval.d20 is the NDCG@20 and ERR@20 for Att-LeToR-Duet. The final evaluation results at depth 20 is eval, and the final ranking result is trec.

 

The ClueWeb09 SDM runs were kindly shared by Laura Dietz and  Jeff Dalton. They were originally used in the EQFE work.

 

The TransE embeddings were provided by Han Xu from Tsinghua University. It can be found in their github.

 

 

Bibtex:

@inproceedings{xiong2017duet,

  title={Word-Entity Duet Representations for Document Ranking},

  author={Xiong, Chenyan and Callan, Jamie and Liu, Tie-Yan},

  booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)},

  pages = {763--772},

  year = {2017},

  organization={ACM}

};

 

 

Acknowledgements:

This research is sponsored by National Science Foundation grant IIS-1422676Google through its support of the Worldly Knowledge and Using Freebase for Improved Information Retrieval, and a Fellowship from Allen Institute for Artificial Intelligence. Any opinions, findings, conclusions or recommendations expressed on this Web site are those of the authors, and do not necessarily reflect those of the sponsors.

 

 

Updated on August 20, 2017

Chenyan Xiong