Search Engines:
11-442 / 11-642
 
CMU logo
 

HelloHW5

 

helloHW5 is a short Python script that demonstrates the use of two new software capabilities: dense retrieval and answer generation. It requires i) a new or upgraded Conda environment, and ii) some additional files in your INPUT_DIR.

Conda Environment

Depending on your platform, you may need to upgrade your Conda environment or install a new one. Updating is easiest, so try that first. If it fails, try installing the new environment.

Directory Requirements

Put helloHW5.py in the QryEval directory that you used for HW1, HW2, and HW4.

Add the following new files to your INPUT_DIR.

  1. co-condenser-marco-retriever (.zip or .tgz): A co-condenser model for encoding queries and passages as dense vectors. 387MB compressed, 418 MB uncompressed.
  2. index-cw09-faiss-t32b300-Fp: A FAISS dense vector index for ClueWeb09. 1.6 GB.
  3. flan-t5-base (.zip or .tgz): A Flan-T5-Base large instruction-following language model. 3.6 GB compressed, 3.9 GB uncompressed.

helloHW5

Run the software.

python helloHW5.py

==> Retrieval <==
Query: Do cigarettes cause cancer?
Internal docids: [225540 249847 166432 447714 155923 246000 489120 529022 451014 287559]
Scores: [178.26248 174.80592 174.6508 174.16243 173.93791 173.78023 173.71877
173.70064 173.60355 173.49403]

==> Retrieval augmented generation <==
Question: Do cigarettes cause cancer?
Answer 1 (no retrieval): no
Answer 2 (w/retrieval): Smoking may be a cause of breast cancer, argued Swiss medical expert Alfredo Mor

Feedback

Please send Jamie a quick email (callan@cs.cmu.edu) that contains the following information.

  1. Information about your laptop: Amount of RAM and operating system. MAC people, please also say which architecture you have.
  2. Whether the software successfully ran to completion.
  3. A copy or brief summary of any warning or error messages.

Thanks for your help!


Copyright 2025, Carnegie Mellon University.
Updated on March 29, 2025

Jamie Callan