Can we see the full TREC topics?
Yes. q100-200.topics.txt
I'm having trouble using a script to interact with the trec_eval or ndeval service.
sample trec_eval service script.
sample ndeval_eval service script.
I am confused about scaling.
Rank | q | q1 | q2 | |
---|---|---|---|---|
1 | d1 | d1 | d1 | |
2 | d2 | d2 | d3 | |
3 | d3 | d5 | d5 | |
4 | d4 | d4 | d6 | |
--- maxInputRankingsLength --- | ||||
5 | d5 | d3 | d2 | |
6 | d6 | d6 | d4 | |
7 | d7 | d7 | d7 |
(Answer provided by Qing Liu, a champion TA from the past.)
Do xQuAD and PM2 produce rankings that have monotonically decreasing scores?
For xQuAD, yes.
For PM2, usually, but not always. For example, PM2 can produce this ranking:
: : : : : : : 157 Q0 clueweb09-en0010-50-26285 19 0.004970394127 reference 157 Q0 clueweb09-enwp03-34-00477 20 0.005192763939 reference : : : : : : :
PM2 creates a document ranking. It does not guarantee that the ranking will be preserved if the list is sorted by the scores. This isn't a problem for your software (you don't need to sort the list), but it will cause small problems when trec_eval and nd_eval evaluate the ranking.
Your software must produce monotonically decreasing scores in the PM2 ranking. It does not matter what the scores are. One simple solution is something like this:
if score (ranki+1) >= score (ranki) score (ranki+1) = score (ranki) * 0.999
This is a source of small scoring problems that cause much frustration, because it has just a small effect on just a few test cases. Beware.
ranki+i
Sometimes when PM2 is constructing a ranking, it reaches a point
where all of the intent scores for all of the remaining
candidate documents are zero. This causes all of the diversified
document scores to be zero. How should PM2 handle this case?
Complete the ranking using the remaining candidate documents in relevance
order. Assign them scores that maintain the ranking order (i.e.,
diversified document scores > relevance-only document scores).
If the FAQ hasn't answered your question, please search the Piazza forum to see if someone has already answered your question before you ask it.
Copyright 2024, Carnegie Mellon University.
Updated on November 20, 2024