Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search

Language Technologies Institute

Carnegie Mellon University

Language Technologies Institute

Carnegie Mellon University

Language Technologies Institute

Carnegie Mellon University

Department of Computer Science and Technology

Tsinghua University

Abstract

This paper presents Conv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search. Instead of exact matching query and document n-grams, Conv-KNRM uses Convolutional Neural Networks to represent n-grams of various lengths and soft matches them in a unified embedding space. The n-gram soft matches are then utilized by the kernel pooling and learning-to-rank layers to generate the final ranking score. Conv-KNRM can be learned end-to-end and fully optimized from user feedback. The learned model's generalizability is investigated by testing how well it performs in a related domain with small amounts of training data. Experiments on English search logs, Chinese search logs, and TREC Web track tasks demonstrated consistent advantages of Conv-KNRM over prior neural IR methods and feature-based methods.

Datasets

K-NRM Word Embeddings

Sogou-Log

Bing-Log

Conv-KNRM Word Embeddings and Convolution Filters

Sogou-Log

Bing-Log

Data Format

vocab

A vocabulary file that maps word to interger IDs.

embedding

The word embedding file

First Line: vocabulary_size embedding_dimension
From Second Line: term_id v1 v2 ... vn. The term vector for term of id=term_id

fiter1, fiter2, fiter3

The convolution filters and bias weights for h-grams. h=1,2,3. The filters and bias are used to convert word embeddings into n-gram embeddings. For each h-gram, 128 filters were used. Each filter is of size h * embedding_size. Each filter weighted-sum the all elements in n consecuitive words' embeddings into one real number. Using 128 filters give a 128-dimension vecter for the h-gram. After that, a 128-dimension bias vector is added to the h-gram vector, and element-wise relu is applied to generate the final h-gram embedding.

First Line: h (1,2,3), embedding_size, number_of_filters (128)
Second Line: bias vector. It has number_of_filters=128 elements.
From the Third Line : The i-th convolution filter. It has h * embedding_size elements.

Acknowledgements

This research was supported by National Science Foundation (NSF) grant IIS-1422676. We thank Shane Culpepper for sharing the Bing search log with us. Any opinions, findings, and conclusions in this paper are the authors' and do not necessarily reflect those of the sponsors.

Download Paper PDF

Updated on November 26, 2017

Zhuyun Dai