Failure prediction based on anomaly detection for complex core routers

Published

Conference Paper

© 2018 ACM. Data-driven prognostic health management is essential to ensure high reliability and rapid error recovery in commercial core router systems. The effectiveness of prognostic health management depends on whether failures can be accurately predicted with sufficient lead time. This paper describes how time-series analysis and machine-learning techniques can be used to detect anomalies and predict failures in complex core router systems. First, both a feature-categorization-based hybrid method and a changepoint-based method have been developed to detect anomalies in time-varying features with different statistical characteristics. Next, a SVM-based failure predictor is developed to predict both categories and lead time of system failures from collected anomalies. A comprehensive set of experimental results is presented for data collected during 30 days of field operation from over 20 core routers deployed by customers of a major telecom company.

Full Text

Duke Authors

Cited Authors

  • Jin, S; Zhang, Z; Chakrabarty, K; Gu, X

Published Date

  • November 5, 2018

Published In

International Standard Serial Number (ISSN)

  • 1092-3152

International Standard Book Number 13 (ISBN-13)

  • 9781450359504

Digital Object Identifier (DOI)

  • 10.1145/3240765.3243476

Citation Source

  • Scopus