Computing Shapley Values for Dynamic Data
Data valuation is a core function in data markets and cooperative data sharing. Shapley value is a widely used approach to fairly measure the contribution of data points towards a collective utility (e.g., a machine learning model trained from the data). However, computing Shapley values is known to be in general #P-hard due to the exponential utility evaluation. Furthermore, the presence of dynamic data poses additional challenges due to the prohibitively expensive cost of recomputing from scratch. In this paper, we study the problem of Dynamic Shapley Value Computation, which focuses on updating Shapley values when dynamically adding or deleting data points. For adding, to prune redundant computation of overlapping model utilities, we propose the pivot-based algorithm that can reduce half the computation time in expectation. We also propose delta-based algorithms to capture Shapley value changes, which require only a smaller sample size to converge. For deleting, we present the YN-NN algorithm that derives the new Shapley values from precomputed utilities efficiently. Based on Shapley value changes, we give another version of the delta-based algorithm for deleting data points. Besides, we propose heuristic algorithms that draw on experimental observations for addition, deletion, and hybrid scenarios. Extensive experimental results demonstrate the efficiency and effectiveness of our proposed algorithms.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Information Systems
- 46 Information and computing sciences
- 08 Information and Computing Sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Information Systems
- 46 Information and computing sciences
- 08 Information and Computing Sciences