A Multisite Characterization Study on Failure Causes in System and Applications Software
A fundamental aspect of software reliability engineering is to understand how software failures manifest, identifying and comprehending their causes and effects. In this paper, we perform ex-post analyses of field software failure data, looking to characterize their causes. The failures analyzed were collected from hundreds of computer systems located in different workplaces. We consider different aspects of each failure cause analyzed, such as their type, context, software layer, and code where it manifested. We found that 84% of the failure causes were related to memory addressing, responsiveness, and exception handling. The rate of the prevalent failure causes seems to correlate with the length of the failed codes' runtime. Regardless of the type of the failed code, the prevalent failure causes were programming-related.