The TREC-2013 Crowdsourcing Dataset



The 2013 TREC Crowdsourcing Track was done with subsets of the ClueWeb12 dataset. The track organizers extracted the documents from the ClueWeb12 dataset and distributed them via tgz files asa convenience to track participants. These files were prepared by Gaurav Baruah, Gabriella Kazai, and Mark D. Smucker, who organized the 2013 TREC Crowdsourcing Track.

There are two versions of the dataset. Version 1.0 is the original dataset. Some documents in Version 1.0 were inadvertently truncated during the extraction process. Version 1.1 contains the complete versions of those documents. Version 1.1 is considered the current version of the dataset.

Version 1.1:

Version 1.0:

 

Updated on August 22, 2013.
Maintained by David Pane and Jamie Callan.