Empirical comparison of techniques for automated failure diagnosis
Automated techniques to diagnose the cause of system failures based on monitoring data is an active area of research at the intersection of systems and machine learning. In this paper, we identify three tasks that form key building blocks in automated diagnosis: 1. Identifying distinct states of the system using monitoring data. 2. Retrieving monitoring data from past system states that are similar to the current state. 3. Pinpointing attributes in the monitoring data that indicate the likely cause of a system failure. We provide (to our knowledge) the first apples-to-apples comparison of both classical and state-of-the-art techniques for these three tasks. Such studies are vital to the consolidation and growth of the field. Our study is based on a variety of failures injected in a multitier Web service. We present empirical insights and research opportunities.