Scholars@Duke publication: Classification with label noise: a Markov chain sampling framework

Classification with label noise: a Markov chain sampling framework

Publication , Journal Article

Zhao, Z; Chu, L; Tao, D; Pei, J

Published in: Data Mining and Knowledge Discovery

September 1, 2019

The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.

Duke Scholars

Author Jian Pei Computer Science

Published In

Data Mining and Knowledge Discovery

DOI

10.1007/s10618-018-0592-8

EISSN

1573-756X

ISSN

1384-5810

Publication Date

September 1, 2019

Volume

Issue

Start / End Page

1468 / 1504

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences
0806 Information Systems
0804 Data Format
0801 Artificial Intelligence and Image Processing

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhao, Z., Chu, L., Tao, D., & Pei, J. (2019). Classification with label noise: a Markov chain sampling framework. Data Mining and Knowledge Discovery, 33(5), 1468–1504. https://doi.org/10.1007/s10618-018-0592-8

Zhao, Z., L. Chu, D. Tao, and J. Pei. “Classification with label noise: a Markov chain sampling framework.” Data Mining and Knowledge Discovery 33, no. 5 (September 1, 2019): 1468–1504. https://doi.org/10.1007/s10618-018-0592-8.

Zhao Z, Chu L, Tao D, Pei J. Classification with label noise: a Markov chain sampling framework. Data Mining and Knowledge Discovery. 2019 Sep 1;33(5):1468–504.

Zhao, Z., et al. “Classification with label noise: a Markov chain sampling framework.” Data Mining and Knowledge Discovery, vol. 33, no. 5, Sept. 2019, pp. 1468–504. Scopus, doi:10.1007/s10618-018-0592-8.

Zhao Z, Chu L, Tao D, Pei J. Classification with label noise: a Markov chain sampling framework. Data Mining and Knowledge Discovery. 2019 Sep 1;33(5):1468–1504.

Published In

Data Mining and Knowledge Discovery

DOI

10.1007/s10618-018-0592-8

EISSN

1573-756X

ISSN

1384-5810

Publication Date

September 1, 2019

Volume

Issue

Start / End Page

1468 / 1504

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences
0806 Information Systems
0804 Data Format
0801 Artificial Intelligence and Image Processing