An Empirical Exploratory Analysis of Failure Sequences in a Commodity Operating System
A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this paper, we present an exploratory study on multiple-event failures, looking for systematic patterns of sequences of failures in logs of a commodity operating system. This study is based on real failure data collected from hundreds of computers. The major contribution of this paper is the method proposed to discover patterns of failure sequences and their attributes. The method is generic enough to be applied to any other software system, with minor changes. The empirical findings of this study include 153 different patterns of OS failure sequences discovered, along with statistical analyses of their properties.