Loading...

A Sample Selection Method for Cost Reduction in Crowd Computing

Mohammadi, Jafar | 2016

1112 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 48475 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Rabiee, Hamid Reza
  7. Abstract:
  8. The goal of crowd labeling is to find labels of given samples using humans’ mind power.Since crowds are not necessarily experts, their provided labels are rather noisy and erroneous.This challenge is usually resolved by collecting multiple labels for each sample and aggregating them to estimate its true label. Although this mechanism leads to high-quality labels, it is not actually cost effective. Adaptive methods consider that only some samples are challenging and require more labels. They spend the budget more wisely, and iteratively collect the required labels. Using adaptive methods approach, we utilize statistical latent models to model and analyze the collected labels and low-rank matrix factorization methods to estimate unseen labels.In the statistical section, we provide an adaptive method which first estimates the expected “gain” that can be achieved by requesting a new label for each sample, and then requests a new label for the sample with the highest expected gain. Afterwards, we provide two definitions for gain: “reducing uncertainties of estimated true labels” and “changes in probability of correctness of estimated true labels”. To calculate the mentioned gains, we propose a general probabilistic model that all surveyed methods can be considered as a special case of that.In the low-rank section, the goal is to estimate unseen labels. Here, we address the main problems of using current low-rank matrix factorization methods in estimating unseen crowd labels. Afterwards, we show that detecting the wrong labels can lead us to estimate unseen labels using low-rank methods. Then, we propose a factorization method that simultaneously detects wrong crowd labels and estimates the unseen labels. The estimated and collected labels together empower simple adaptive methods to behave like more complex ones.
    Finally, using real and synthetic datasets, we show that the proposed methods estimate true labels more accurate than the previously proposed methods
  9. Keywords:
  10. Crowdsourcing ; Labeling Algorithm ; Adaptive Method ; Low-Rank Matrix ; Statistical Latent Model ; Low-Rank Matrix Factorization

 Digital Object List