Dynamic Shapley Value Computation
With the prevalence of data-driven research, data valuation has attracted attention from the computer science field. How to appraise a single datum becomes an imperative problem, especially in the context of machine learning. Shapley value is widely used to fairly measure the contribution of data points in machine learning since it is the unique definition that satisfies all four desired properties: balance, symmetry, additivity, and zero element. However, computing Shapley value is known to be a #P-hard problem. As data is subject to changes, dynamic data exists pervasively in real-world scenarios. Pricing such dynamic data is more challenging due to the prohibitively expensive cost of recalculation from scratch. In this paper, we study the problem of Dynamic Shapley Value Computation, which updates Shapley value when dynamically adding/deleting data points. For adding data points, to prune unnecessary computation of overlapping model utilities, we propose the pivot-based algorithm that can reduce half computation time in general. We also propose the delta-based algorithm to capture Shapley value changes, which requires a smaller sample size to converge. For deleting data points, we present the YN-NN algorithm that derives the new Shapley value from the data structure of precomputed model utilities in an efficient way. Based on Shapley value changes, we give another version of the delta-based algorithm for deleting data points. Besides, we propose heuristic algorithms to draw on experimental observations for both adding and deleting data points. Extensive experimental results demonstrate the efficiency and effectiveness of our proposed algorithms.