Skip to main content

Efficiently Cleaning Structured Event Logs: A Graph Repair Approach

Publication ,  Journal Article
Huang, R; Wang, J; Song, S; Lin, X; Zhu, X; Pei, J
Published in: ACM Transactions on Database Systems
March 14, 2023

Event data are often dirty owing to various recording conventions or simply system errors. These errors may cause serious damage to real applications, such as inaccurate provenance answers, poor profiling results, or concealing interesting patterns from event data. Cleaning dirty event data is strongly demanded. While existing event data cleaning techniques view event logs as sequences, structural information does exist among events, such as the task passing relationships between staffs in workflow or the invocation relationships among different micro-services in monitoring application performance. We argue that such structural information enhances not only the accuracy of repairing inconsistent events but also the computation efficiency. It is notable that both the structure and the names (labeling) of events could be inconsistent. In real applications, while an unsound structure is not repaired automatically (which requires manual effort from business actors to handle the structure error), it is highly desirable to repair the inconsistent event names introduced by recording mistakes. In this article, we first prove that the inconsistent label repairing problem is NP-complete. Then, we propose a graph repair approach for (1) detecting unsound structures, and (2) repairing inconsistent event names. Efficient pruning techniques together with two heuristic solutions are also presented. Extensive experiments over real and synthetic datasets demonstrate both the effectiveness and efficiency of our proposal.

Duke Scholars

Published In

ACM Transactions on Database Systems

DOI

EISSN

1557-4644

ISSN

0362-5915

Publication Date

March 14, 2023

Volume

48

Issue

1

Related Subject Headings

  • Information Systems
  • 4609 Information systems
  • 4605 Data management and data science
  • 4009 Electronics, sensors and digital hardware
  • 0806 Information Systems
  • 0804 Data Format
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Huang, R., Wang, J., Song, S., Lin, X., Zhu, X., & Pei, J. (2023). Efficiently Cleaning Structured Event Logs: A Graph Repair Approach. ACM Transactions on Database Systems, 48(1). https://doi.org/10.1145/3571281
Huang, R., J. Wang, S. Song, X. Lin, X. Zhu, and J. Pei. “Efficiently Cleaning Structured Event Logs: A Graph Repair Approach.” ACM Transactions on Database Systems 48, no. 1 (March 14, 2023). https://doi.org/10.1145/3571281.
Huang R, Wang J, Song S, Lin X, Zhu X, Pei J. Efficiently Cleaning Structured Event Logs: A Graph Repair Approach. ACM Transactions on Database Systems. 2023 Mar 14;48(1).
Huang, R., et al. “Efficiently Cleaning Structured Event Logs: A Graph Repair Approach.” ACM Transactions on Database Systems, vol. 48, no. 1, Mar. 2023. Scopus, doi:10.1145/3571281.
Huang R, Wang J, Song S, Lin X, Zhu X, Pei J. Efficiently Cleaning Structured Event Logs: A Graph Repair Approach. ACM Transactions on Database Systems. 2023 Mar 14;48(1).

Published In

ACM Transactions on Database Systems

DOI

EISSN

1557-4644

ISSN

0362-5915

Publication Date

March 14, 2023

Volume

48

Issue

1

Related Subject Headings

  • Information Systems
  • 4609 Information systems
  • 4605 Data management and data science
  • 4009 Electronics, sensors and digital hardware
  • 0806 Information Systems
  • 0804 Data Format