Virtual SIGSAND 2020

Crowdsourcing for Repurposable Data: What We Lose When We Train Our Crowds

May 12, 2020

308

Authors: Jeffrey Parsons, Roman Lukyanenko, Shawn Ogunseye

Tags: 2020

Users of crowdsourced data expect that knowledge of the domain of a data crowdsourcing task will positively affect the data that their contributors provide, so they train potential participants on the crowdsourcing task to be performed. We carried out an experiment to test how training affects data quality and data repurposability – the capacity for data to flexibly accommodate both anticipated and unanticipated uses. Eighty-four contributors trained explicitly (using rules), implicitly (using exemplars), and untrained, report the sighting of artificial insects and other entities in a simulated citizen science project. We find that there are no information quality or data repurposability advantages to training contributors. Trained contributors reported fewer differentiating attributes of entities and fewer total attributes of the entities they observed. Trained contributors are therefore less likely to report data that can lead to discoveries. We discuss the implications of our findings to the design of inclusive data crowdsourcing systems.

Cite as:
Ogunseye S., Parsons J., Lukyanenko R. (2020). “Crowdsourcing for Repurposable Data: What We Lose When We Train Our Crowds,” in AIS SIGSAND, Virtually in West Palm Beach, FL, United States, May 22, 2020.