Search Engines:
11-442 / 11-642
HW1: Lexical Retrieval Design Guide
HW1 requires you to understand and make several extensions to the
QryEval search engine architecture. It can be a little difficult to
know where to start. We recommend the following path.
Your Unranked Boolean retrieval model supports the #OR
operator. Extend it to cover the #AND operator. There are four main
steps.
- Create the QrySopAnd class. Copy the QrySopOr class and change
the logic of the matching criteria (docIteratorHasMatch) from "match
any argument" to "match all arguments".
- Consider whether you need to adjust how scores are calculated.
- Modify the QryParser to recognize #AND operators in queries.
- Change the default query operator from #OR to #AND.
Unranked Boolean is now finished.
Implement the Ranked Boolean retrieval model. There are four main steps.
- Create a RetrievalModelRankedBoolean class. Copy the
RetrievalModelUnrankedBoolean class and make changes as
necessary.
- Extend the Ranker class to support RetrievalModelRankedBoolean.
- Modify the #SCORE operator to generate scores for the Ranked
Boolean retrieval model. #SCORE needs to produce a score for the
current matching document. A #SCORE operator always has one argument
(._args[0]) that is a QryIopXxx operator. Look at the QryIop class
to see what functions all QryIopXxx query operators support. One of
them gives you access to tf and term locations.
- Depending on how you implemented #OR and #AND score
calculations, you may need to make adjustments to QrySopOr and
QrySopAnd to support the RankedBoolean retrieval model.
Ranked Boolean is now finished.
Implement BM25. There are five main steps, but it is similar
to what you did for RankedBoolean.
- Create a RetrievalModelBM25 class. Note that the BM25 retrieval
model has parameters (b, k1). The retrieval model is a
convenient place to store them because it is passed through the
retrieval pipeline.
- Extend the Ranker class to support RetrievalModelBM25.
- Modify the #SCORE operator to generate scores for the BM25
retrieval model.
- Create the QrySopSum and QrySopWsum classes.
- Modify te QryParser to recognize #SUM and #WSUM in queries.
BM25 is now finished.
Finally, implement the #NEAR operator. There are three main steps.
- Use the QryIopSyn class to guide your implementation of a
QryIopNear class.
- The evaluate method is the heart of each QryIopXxx operator. It
uses the arguments (which are always QryIopXxx operators that have
inverted lists) to produce a new inverted list.
- Modify the QryParser to recognize #NEAR operators in queries.
If you have done it correctly, #NEAR works for all three retrieval
models.
FAQ
If you have questions not answered here, see the
HW1 FAQ.
Copyright 2025, Carnegie Mellon University.
Updated on January 20, 2025
Jamie Callan