Skip to main content

Rethinking Software Fault Tolerance

Publication ,  Journal Article
Trivedi, KS; Grottke, M; Lopez, JA
Published in: IEEE Transactions on Reliability
March 1, 2024

Traditional software fault tolerance makes use of design-diversity-based redundancy. While proven to be effective, the independent development of multiple versions of a program or component is connected with high costs. This article shows that failures caused by so-called Mandelbugs (i.e., software faults whose activation and/or error propagation depends on the system environment) can often be treated by generating or forcing a new or modified execution environment. In the case of aging-related bugs, a subtype of Mandelbugs, failures can be postponed/prevented via a proactive technique known as software rejuvenation. Indeed, techniques based on environmental diversity, such as retry, reboot, or failover to an identical replica, are successfully used in practice. We discuss two such real-case examples, the IBM Session Initiation Protocol (SIP) Application Server cluster and Avaya gateway servers.

Duke Scholars

Published In

IEEE Transactions on Reliability

DOI

EISSN

1558-1721

ISSN

0018-9529

Publication Date

March 1, 2024

Volume

73

Issue

1

Start / End Page

67 / 72

Related Subject Headings

  • Operations Research
  • 4612 Software engineering
  • 4010 Engineering practice and education
  • 0906 Electrical and Electronic Engineering
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Trivedi, K. S., Grottke, M., & Lopez, J. A. (2024). Rethinking Software Fault Tolerance. IEEE Transactions on Reliability, 73(1), 67–72. https://doi.org/10.1109/TR.2023.3330787
Trivedi, K. S., M. Grottke, and J. A. Lopez. “Rethinking Software Fault Tolerance.” IEEE Transactions on Reliability 73, no. 1 (March 1, 2024): 67–72. https://doi.org/10.1109/TR.2023.3330787.
Trivedi KS, Grottke M, Lopez JA. Rethinking Software Fault Tolerance. IEEE Transactions on Reliability. 2024 Mar 1;73(1):67–72.
Trivedi, K. S., et al. “Rethinking Software Fault Tolerance.” IEEE Transactions on Reliability, vol. 73, no. 1, Mar. 2024, pp. 67–72. Scopus, doi:10.1109/TR.2023.3330787.
Trivedi KS, Grottke M, Lopez JA. Rethinking Software Fault Tolerance. IEEE Transactions on Reliability. 2024 Mar 1;73(1):67–72.

Published In

IEEE Transactions on Reliability

DOI

EISSN

1558-1721

ISSN

0018-9529

Publication Date

March 1, 2024

Volume

73

Issue

1

Start / End Page

67 / 72

Related Subject Headings

  • Operations Research
  • 4612 Software engineering
  • 4010 Engineering practice and education
  • 0906 Electrical and Electronic Engineering
  • 0803 Computer Software