HW2 requires you to understand and make several extensions to the QryEval search engine architecture. We recommend the following development path.
Start with BM25, which is the simpler of the two retrieval models. There are five main steps.
RetrievalModelBM25
class. The
ranker.BM25:xxx
parameters need to find their way
from the .param file into this class. The default query operator
is #SUM.RetrievalModelBM25
.
QrySopSum
class. Copy
the QrySopOr
class and adjust how scores are
calculated.QryParser
class to recognize #SUM
operators in queries.Indri is more complex. It is easiest to tackle Indri in two stages. In the first stage, follow an implementation process that is similar to what was used for BM25.
RetrievalModelIndri
class. The
ranker.Indri:xxx
parameters need to find their way
from the .param file into this class. The default query operator
is #AND.RetrievalModelIndri
.
getDefaultScore
function to
QrySopScore
. See the
HW2
Implementation slides.QrySopAnd
class. It needs
three adjustments.
RetrievalModelIndri
.RetrievalModelIndri
.getDefaultScore
function. See the
HW2
Implementation slides.Once the Indri #AND operator is working properly, repeat that same process for the #WAND and #WSUM operators. There are two steps.
QrySopWand
and QrySopWsum
classes by copying the QrySopAnd
class, deleting the
parts that support the boolean retrieval models, and adjusting how
scores are calculated.QryParser
to recognize the #WSUM and
#WAND operators. This is not difficult, but it does require a
little care. Weights are determined by their position in the
query. #WSUM(2 iphone 1 14)
is a valid query. Note
that Lucene's lexical analyzer may convert some lexical tokens
into two query terms, e.g., Lucene turns "near-death" into "near"
and "death". Do not do your own lexical processing of
terms, because your choices may not match Lucene's. Just allow
for the possibility that one token may produce more than one term.
The user query #WSUM(2 near-death 1 experience)
should be parsed into the query
#WSUM(2 near 2 death 1 experience)
.
Implement the WINDOW operator. There are three main steps.
QryIopNear
class to guide your
implementation of a QryIopWindow
class.That's about it.
If you have questions not answered here, see the HW2 FAQ.
Copyright 2024, Carnegie Mellon University.
Updated on February 07, 2024