Scholars@Duke publication: Top-k and clustering with noisy comparisons

Top-k and clustering with noisy comparisons

Publication , Journal Article

Davidson, S; Khanna, S; Milo, T; Roy, S

Published in: ACM Transactions on Database Systems

December 30, 2014

We study the problems of max/top-k and clustering when the comparison operations may be performed by oracles whose answer may be erroneous. Comparisons may either be of type or of value: given two data elements, the answer to a type comparison is "yes" if the elements have the same type and therefore belong to the same group (cluster); the answer to a value comparison orders the two data elements. We give efficient algorithms that are guaranteed to achieve correct results with high probability, analyze the cost of these algorithms in terms of the total number of comparisons (i.e., using a fixed-cost model), and show that they are essentially the best possible. We also show that fewer comparisons are needed when values and types are correlated, or when the error model is one in which the error decreases as the distance between the two elements in the sorted order increases. Finally, we examine another important class of cost functions, concave functions, which balances the number of rounds of interaction with the oracle with the number of questions asked of the oracle. Results of this article form an important first step in providing a formal basis for max/top-k and clustering queries in crowdsourcing applications, that is, when the oracle is implemented using the crowd. We explain what simplifying assumptions are made in the analysis, what results carry to a generalized crowdsourcing setting, and what extensions are required to support a full-fledged model.

Duke Scholars

Author Sudeepa Roy Computer Science

Altmetric Attention Stats

Dimensions Citation Stats

Published In

ACM Transactions on Database Systems

DOI

10.1145/2684066

EISSN

1557-4644

ISSN

0362-5915

Publication Date

December 30, 2014

Volume

Issue

Related Subject Headings

Information Systems
4609 Information systems
4605 Data management and data science
4009 Electronics, sensors and digital hardware
0806 Information Systems
0804 Data Format

Citation

APA

Chicago

ICMJE

MLA

NLM

Davidson, S., Khanna, S., Milo, T., & Roy, S. (2014). Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems, 39(4). https://doi.org/10.1145/2684066

Davidson, S., S. Khanna, T. Milo, and S. Roy. “Top-k and clustering with noisy comparisons.” ACM Transactions on Database Systems 39, no. 4 (December 30, 2014). https://doi.org/10.1145/2684066.

Davidson S, Khanna S, Milo T, Roy S. Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems. 2014 Dec 30;39(4).

Davidson, S., et al. “Top-k and clustering with noisy comparisons.” ACM Transactions on Database Systems, vol. 39, no. 4, Dec. 2014. Scopus, doi:10.1145/2684066.

Davidson S, Khanna S, Milo T, Roy S. Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems. 2014 Dec 30;39(4).

Published In

ACM Transactions on Database Systems

DOI

10.1145/2684066

EISSN

1557-4644

ISSN

0362-5915

Publication Date

December 30, 2014

Volume

Issue

Related Subject Headings

Information Systems
4609 Information systems
4605 Data management and data science
4009 Electronics, sensors and digital hardware
0806 Information Systems
0804 Data Format