The purpose of this assignment is to gain experience with dense passage retrieval (DPR) and retrieval augmented generation (RAG). This assignment consists of several parts.
HW5 adds two new capabilities to your QryEval search engine:
The dense vector is a new first-stage ranking option, serving a purpose similar to Ranked Boolean and BM25. Architecturally, it is just another ranker at the start of your ranking pipeline.
Retrieval augmented generation adds a new stage to your search
engine architecture. Your system should treat RAG as an agent that
consumes the results of the ranking pipeline. Thus, your HW5 system
will consist of three stages.
ranker → rerankers (when specified) → agent
(RAG)
A production system might support several different types of agent, similar to how your system now supports two rerankers (Ltr and Bert). Yours will have just the RAG agent.
See the Design Guide for more information about how to implement these new capabilities.
Depending on your platform, you may need to upgrade your Conda environment or install a new one. Updating is easiest, so try that first. If it fails, try installing the new environment.
Upgrade: pip install SentencePiece
You may upgrade your Python to 3.9, if that helps.
New environment:
conda env create -f 11x42-25S-b.yml
This is the environment that the homework testing service will use
for HW5.
Windows people, see the FAQ if you get an error message when your system does generation. You may need to make a small adjustment to your conda installation.
Your system's new capabilities are configured by the new parameters shown below.
The RAG agent writes results to a file in a format understood by squad_eval, the standard evaluation software for the SqUAD dataset. You must write the software that produces this output. We refer to this as a .qaIn file.
The .qaIn file is a simple json format that consists of question ids
and answer values. An example is shown below.
{"56ddde6b9a695914005b962a": "norway", "56ddde6b9a695914005b962c": "16th century", ... }
The file contains one key/value pair for each question. Python's json library is a convenient way to write this file.
This assignment is done with the ClueWeb09 inverted file index that you have used all semester, as well as with the following new data files.
Use the HW5 Testing Page to access the trec_eval and homework testing services.
You may do local testing on your laptop, as you did for HW1, HW2, and HW4. The HW5 test cases and grading files (combined into a single directory) are available for download (zip, tgz).
Conduct experiments and analyses that investigate the effectiveness of vector-based retrieval and retrieval augmented generation in different situations. Test your models on the HW5 questions.
You will have an opportunity to test a default generation prompt and several custom prompts. The default prompt, prompt 1, is defined as follows.
question: {question} context: {context}The context is a passage selected from the top-ranked document.
All students must conduct a set of reproducible experiments. Undergraduate students must write brief reports that document their work. Graduate students must write longer reports that analyze the experimental results and draw conclusions.
The first experiment examines the effects of passage selection strategies on a baseline retrieval augmented generation system.
The model and prompt are configured as follows.
The search engine returns documents, but usually it is impractical to pass an entire document to the generator. Instead, a passage is selected from the document. Investigate passage sizes from 25 to 200 tokens using firstp and bestp passage selection. Set psgCnt = 6 for bestp passage selection.
Report results in four tables: {firstp, bestp} × {t5-base, flan-t5-base}.
The second experiment investigates the effect of the prompt on two LLMs. Compare the default prompt to four custom prompts that you develop. Use two LLMs (flan-t5-base, t5-base) to help you identify trends.
It is not necessary for your custom prompts to improve on the default prompt. Experiments are evaluated on the quality and practicality of the hypotheses that are explored.
Configure the system (psgCnt, psgLen, etc) based upon your conclusions about the first experiment.
The result is two tables of experimental results.
The last experiment investigates the effect of ranking accuracy on answer quality produced by flan-t5-base and t5-base. Select one retrieval augmented generation configuration (passage selection, prompt, etc) based on your prior experiments. Invesigate how that configuration performs when the input ranking is varied.
Test your retrieval augmented generation system with rankings produced by the following ranking pipelines. You have the freedom to set parameters however you wish unless indicated otherwise.
The result is two tables of experimental results.
Warning: This experiment is the most time-consuming of the three experiments. Be sure to leave yourself enough time to complete this experiment.
11-442 students must submit a report that contains a statement of collaboration and originality, and their experimental results. A template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.
11-642 students must write a report that describes their work and their analysis of the experimental results. A report template is provided in Microsoft Word and pdf formats. The report must follow the structure provided in the template.
Create a .zip file that contains your software, following the same requirements used for interim software submissions. Name your report yourAndrewID-HW5-Report.pdf and place it in the same zip file directory that contains your software (e.g., the directory that contains QryEval.java).
Submit your homework by checking the "Final Submission" box in the homework testing service. We will run a complete set of tests on your software, so you do not need to select tests to run. If you make several final submissions, we will grade your last submission.
The Homework Services web page provides information about your homework submissions and access to graded homework reports.
The grading requirements and advice are the same as for HW1.
If you have questions not answered here, see the HW5 FAQ and the Homework Testing FAQ.
Copyright 2025, Carnegie Mellon University.
Updated on April 10, 2025