Skip to main content

Monitoring and mitigating software aging on IBM cloud controller system

Publication ,  Conference
Sukhwani, H; Matias, R; Trivedi, KS; Rindos, A
Published in: Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017
November 14, 2017

As enterprises continue to move their workloads from traditional server-room environments to private cloud-based systems, there is an increasing desire and ability for companies like IBM to centrally monitor the systems on behalf of their customers to proactively help to mitigate any potential failure scenarios. In this paper, we investigate failures caused by software aging affecting an enterprise-class cloud controller system. We describe a service developed to continuously analyze the key system/application metrics from customer systems, identifies potential aging-related failure scenarios within the next two days, and generates a list of tasks for the development-operations team at IBM to mitigate the potential failures. To help the team prioritize the tasks, we propose a prioritization scheme to assign severity to such tasks. From our analysis of two months of offline data, we find that the tasks generated have a precision of around 0.80 and recall of 1, which means that our approach did not miss any aging-related failure event, with around 80% of the failure events being true.

Duke Scholars

Published In

Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017

DOI

Publication Date

November 14, 2017

Start / End Page

266 / 272
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sukhwani, H., Matias, R., Trivedi, K. S., & Rindos, A. (2017). Monitoring and mitigating software aging on IBM cloud controller system. In Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017 (pp. 266–272). https://doi.org/10.1109/ISSREW.2017.65
Sukhwani, H., R. Matias, K. S. Trivedi, and A. Rindos. “Monitoring and mitigating software aging on IBM cloud controller system.” In Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017, 266–72, 2017. https://doi.org/10.1109/ISSREW.2017.65.
Sukhwani H, Matias R, Trivedi KS, Rindos A. Monitoring and mitigating software aging on IBM cloud controller system. In: Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017. 2017. p. 266–72.
Sukhwani, H., et al. “Monitoring and mitigating software aging on IBM cloud controller system.” Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017, 2017, pp. 266–72. Scopus, doi:10.1109/ISSREW.2017.65.
Sukhwani H, Matias R, Trivedi KS, Rindos A. Monitoring and mitigating software aging on IBM cloud controller system. Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017. 2017. p. 266–272.

Published In

Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017

DOI

Publication Date

November 14, 2017

Start / End Page

266 / 272