Skip to main content

Top-k and clustering with noisy comparisons

Publication ,  Journal Article
Davidson, S; Khanna, S; Milo, T; Roy, S
Published in: ACM Transactions on Database Systems
December 30, 2014

We study the problems of max/top-k and clustering when the comparison operations may be performed by oracles whose answer may be erroneous. Comparisons may either be of type or of value: given two data elements, the answer to a type comparison is "yes" if the elements have the same type and therefore belong to the same group (cluster); the answer to a value comparison orders the two data elements. We give efficient algorithms that are guaranteed to achieve correct results with high probability, analyze the cost of these algorithms in terms of the total number of comparisons (i.e., using a fixed-cost model), and show that they are essentially the best possible. We also show that fewer comparisons are needed when values and types are correlated, or when the error model is one in which the error decreases as the distance between the two elements in the sorted order increases. Finally, we examine another important class of cost functions, concave functions, which balances the number of rounds of interaction with the oracle with the number of questions asked of the oracle. Results of this article form an important first step in providing a formal basis for max/top-k and clustering queries in crowdsourcing applications, that is, when the oracle is implemented using the crowd. We explain what simplifying assumptions are made in the analysis, what results carry to a generalized crowdsourcing setting, and what extensions are required to support a full-fledged model.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

ACM Transactions on Database Systems

DOI

EISSN

1557-4644

ISSN

0362-5915

Publication Date

December 30, 2014

Volume

39

Issue

4

Related Subject Headings

  • Information Systems
  • 4609 Information systems
  • 4605 Data management and data science
  • 4009 Electronics, sensors and digital hardware
  • 0806 Information Systems
  • 0804 Data Format
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Davidson, S., Khanna, S., Milo, T., & Roy, S. (2014). Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems, 39(4). https://doi.org/10.1145/2684066
Davidson, S., S. Khanna, T. Milo, and S. Roy. “Top-k and clustering with noisy comparisons.” ACM Transactions on Database Systems 39, no. 4 (December 30, 2014). https://doi.org/10.1145/2684066.
Davidson S, Khanna S, Milo T, Roy S. Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems. 2014 Dec 30;39(4).
Davidson, S., et al. “Top-k and clustering with noisy comparisons.” ACM Transactions on Database Systems, vol. 39, no. 4, Dec. 2014. Scopus, doi:10.1145/2684066.
Davidson S, Khanna S, Milo T, Roy S. Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems. 2014 Dec 30;39(4).

Published In

ACM Transactions on Database Systems

DOI

EISSN

1557-4644

ISSN

0362-5915

Publication Date

December 30, 2014

Volume

39

Issue

4

Related Subject Headings

  • Information Systems
  • 4609 Information systems
  • 4605 Data management and data science
  • 4009 Electronics, sensors and digital hardware
  • 0806 Information Systems
  • 0804 Data Format