51046 |
Creator |
8eb9378b0e3dcd225dfc47fcdc9b35f4 |
51046 |
Creator |
ext-a375ad27f9bba79eaff7219da857a8f7 |
51046 |
Creator |
ext-1bd20fb2a8369a734b45a1b584d8833b |
51046 |
Creator |
ext-1cd6e5f0578cd1928a68eccf77b9903c |
51046 |
Creator |
ext-4a193ef2ae87a5e2160020b06c423d42 |
51046 |
Creator |
ext-7e2d742437412763d805bb99d671f62a |
51046 |
Creator |
ext-8edebc372e764cc21362813fb4a3015b |
51046 |
Creator |
ext-dee261f51438816dd34233bc799d20cc |
51046 |
Date |
2016-07-14 |
51046 |
Is Part Of |
repository |
51046 |
Is Part Of |
p21576912 |
51046 |
abstract |
We present an incremental Bayesian model that resolves key issues of crowd size and
data quality for consensus labeling. We evaluate our method using data collected from
a real-world citizen science program, BeeWatch, which invites members of the public
in the United Kingdom to classify (label) photographs of bumblebees as one of 22 possible
species. The biological recording domain poses two key and hitherto unaddressed challenges
for consensus models of crowdsourcing: (1) the large number of potential species makes
classification difficult, and (2) this is compounded by limited crowd availability,
stemming from both the inherent difficulty of the task and the lack of relevant skills
among the general public. We demonstrate that consensus labels can be reliably found
in such circumstances with very small crowd sizes of around three to five users (i.e.,
through group sourcing). Our incremental Bayesian model, which minimizes crowd size
by re-evaluating the quality of the consensus label following each species identification
solicited from the crowd, is competitive with a Bayesian approach that uses a larger
but fixed crowd size and outperforms majority voting. These results have important
ecological applicability: biological recording programs such as BeeWatch can sustain
themselves when resources such as taxonomic experts to confirm identifications by
photo submitters are scarce (as is typically the case), and feedback can be provided
to submitters in a timely fashion. More generally, our model provides benefits to
any crowdsourced consensus labeling task where there is a cost (financial or otherwise)
associated with soliciting a label. |
51046 |
authorList |
authors |
51046 |
issue |
4 |
51046 |
status |
peerReviewed |
51046 |
uri |
http://data.open.ac.uk/oro/document/633709 |
51046 |
volume |
7 |
51046 |
type |
AcademicArticle |
51046 |
type |
Article |
51046 |
label |
Siddharthan, Advaith ; Lambin, Christopher; Robinson, Anne-Marie; Sharma, Nirwan;
Comont, Richard; O’Mahony, Elaine; Mellish, Chris and Van Der Wal, René (2016). Crowdsourcing
Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize
Crowd Size. ACM Transactions on Intelligent Systems and Technology, 7(4), article
no. 45. |
51046 |
label |
Siddharthan, Advaith ; Lambin, Christopher; Robinson, Anne-Marie; Sharma, Nirwan;
Comont, Richard; O’Mahony, Elaine; Mellish, Chris and Van Der Wal, René (2016). Crowdsourcing
Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize
Crowd Size. ACM Transactions on Intelligent Systems and Technology, 7(4), article
no. 45. |
51046 |
Publisher |
ext-2af1883e4bbfa0356fcedb366171cb38 |
51046 |
Title |
Crowdsourcing Without a Crowd: Reliable Online Species Identification Using Bayesian
Models to Minimize Crowd Size |
51046 |
in dataset |
oro |