Skip to main content

Schemaless join for result set preferences

Publication ,  Conference
Gao, C; Pei, J; Wang, J; Chang, Y
Published in: Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017
November 8, 2017

In many applications, such as data integration and big data analytics, one has to integrate data from multiple sources without detailed and accurate schema information. The state of the art focuses on matching attributes among sources based on the information derived from the data in those sources. However, a best join result according to a method's own pre-determined criteria may not fit a user's best interest. In this paper, we tackle the challenge from a novel angle and investigate how to join schemaless tables to meet a user preference the best. We identify a set of essential preferences that are useful in various scenarios, such as minimizing the number of tuples in outer join results and maximizing the entropy of the joining key's distribution. We also develop a systematic method to compute the best join predicate optimizing an objective function representing a user preference. We conduct extensive experiments on 4 large datasets and compare with 4 baselines from the state of the art of schema matching and attribute clustering. The experimental results clearly show that our algorithm outperforms the baselines significantly in accuracy in all the cases, and consumes comparable running time.

Duke Scholars

Published In

Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017

DOI

Publication Date

November 8, 2017

Volume

2017-January

Start / End Page

569 / 578
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gao, C., Pei, J., Wang, J., & Chang, Y. (2017). Schemaless join for result set preferences. In Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017 (Vol. 2017-January, pp. 569–578). https://doi.org/10.1109/IRI.2017.26
Gao, C., J. Pei, J. Wang, and Y. Chang. “Schemaless join for result set preferences.” In Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017, 2017-January:569–78, 2017. https://doi.org/10.1109/IRI.2017.26.
Gao C, Pei J, Wang J, Chang Y. Schemaless join for result set preferences. In: Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017. 2017. p. 569–78.
Gao, C., et al. “Schemaless join for result set preferences.” Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017, vol. 2017-January, 2017, pp. 569–78. Scopus, doi:10.1109/IRI.2017.26.
Gao C, Pei J, Wang J, Chang Y. Schemaless join for result set preferences. Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017. 2017. p. 569–578.

Published In

Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017

DOI

Publication Date

November 8, 2017

Volume

2017-January

Start / End Page

569 / 578