Skip to main content
construction release_alert
Scholars@Duke will be undergoing maintenance April 11-15. Some features may be unavailable during this time.
cancel

Synthesizing Linked Data under Cardinality and Integrity Constraints

Publication ,  Conference
Gilad, A; Patwa, S; MacHanavajjhala, A
Published in: Proceedings of the ACM SIGMOD International Conference on Management of Data
January 1, 2021

The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating thelinks between relations, subject tocardinality constraints (CCs) andintegrity constraints (ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing its foreign key ($FK$) values, and two types of constraints: (1) CCs that apply to the join view and (2) ICs that apply to the table with missing $FK$ values, our goal is to impute the missing $FK$ values such that the constraints are satisfied. We provide a novel framework for the problem based on declarative CCs and ICs. We further show that the problem is NP-hard and propose a novel two-phase solution that guarantees the satisfaction of the ICs. Phase I yields an intermediate solution accounting for the CCs alone, and relies on a hybrid approach based on CC types. For one type, the problem is modeled as an Integer Linear Program. For the others, we describe an efficient and accurate solution. We then combine the two solutions. Phase II augments this solution by incorporating the ICs and uses a coloring of the conflict hypergraph to infer the values of the $FK$ column. Our extensive experimental study shows that our solution scales well when the data and number of constraints increases. We further show that our solution maintains low error rates for the CCs.

Duke Scholars

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

Publication Date

January 1, 2021

Start / End Page

619 / 631
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gilad, A., Patwa, S., & MacHanavajjhala, A. (2021). Synthesizing Linked Data under Cardinality and Integrity Constraints. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 619–631). https://doi.org/10.1145/3448016.3457242
Gilad, A., S. Patwa, and A. MacHanavajjhala. “Synthesizing Linked Data under Cardinality and Integrity Constraints.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, 619–31, 2021. https://doi.org/10.1145/3448016.3457242.
Gilad A, Patwa S, MacHanavajjhala A. Synthesizing Linked Data under Cardinality and Integrity Constraints. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2021. p. 619–31.
Gilad, A., et al. “Synthesizing Linked Data under Cardinality and Integrity Constraints.” Proceedings of the ACM SIGMOD International Conference on Management of Data, 2021, pp. 619–31. Scopus, doi:10.1145/3448016.3457242.
Gilad A, Patwa S, MacHanavajjhala A. Synthesizing Linked Data under Cardinality and Integrity Constraints. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2021. p. 619–631.

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

Publication Date

January 1, 2021

Start / End Page

619 / 631