Robust Classification with Noisy Labels for Manufacturing Applications: A Hybrid Approach Based on Active Learning and Data Cleaning
Classification is an important machine learning technique that attracts growing interests in various manufacturing applications. Learning an accurate classifier generally requires a large-scale perfectly-labeled training dataset. However, such "golden"labels are not only expensive but also difficult to collect in practice. To facilitate accurate classification in the presence of noisy labels, we propose a novel hybrid method based on active learning and data cleaning. Specifically, we first train an initial classifier with noisily-labeled data. Based on its prediction outcomes, a set of most informative samples is queried for manual annotation. To effectively correct other incorrect labels, we further self-label the unqueried samples based on the true labels provided by human experts and the estimated labels predicted by the initial classifier. As demonstrated by the experimental results based on two industrial datasets, the proposed approach achieves superior accuracy over other conventional methods.