Semi-Supervised Root-Cause Analysis with Co-Training for Integrated Systems
The increasing complexity of integrated systems has exacerbated the challenges associated with system diagnosis. To tackle these challenges, intelligent root-cause-analysis facilitated by machine learning has been proposed in recent years. However, most of these methods rely on a large amount of data with root-cause labels, which are often either not available or difficult to obtain. In this paper, we propose a semi-supervised root-cause-analysis method with co-training, where only a small set of labeled data is required. Using random forest as the learning kernel, a co-training technique is proposed to leverage the unlabeled data by automatically pre-labeling a subset of them and retraining each decision tree. In addition, several novel techniques are proposed to avoid over-fitting and determine hyper-parameters. Two case studies based on industrial designs demonstrate that the proposed approach significantly outperforms state-of-the-art methods by saving up to 43% of labeling efforts by human experts.