collection.tsv.1: first half of the original MS MARCO passage ranking collection
collection.tsv.2: second half of the original MS MARCO passage ranking collection
(split into two halves for parallel computing. You can use more splits.)

collection_jsonl.zip: the same MS MARCO passage ranking collection, in json format.

myalltrain.relevant.docterm_recall: train file