Skip to main content
Journal cover image

Hybrid reliability modeling of fault-tolerant computer systems

Publication ,  Journal Article
Trivedi, K; Dugan, JB; Geist, R; Smotherman, M
Published in: Computers and Electrical Engineering
January 1, 1984

Current technology allows sufficient redundancy in fault-tolerant computer systems to insure that the failure probability due to exhaustion of spares is low. Consequently, the major cause of failure is the inability to correctly detect, isolate, and reconfigure when faults are present. Reliability estimation tools must be flexible enough to accurately model this critical fault-handling behavior and yet remain computationally tractable. This paper discusses reliability modeling techniques based on a behavioral decomposition that provides tractability by separating the reliability model along temporal lines into nearly disjoint fault-occurrence and fault-handling submodels. An Extended Stochastic Petri Net (ESPN) model provides the needed flexibility for representing the fault-handling behavior, while a nonhomogeneous Markov chain accounts for the possibly non-Poisson fault-occurrence behavior. Since the submodels are separate, the ESPN submodel, in which all time constants are of the same order of magnitude, can be simulated. The nonhomogeneous Markov chain is solved analytically, and the result is a hybrid model. The method of coverage factors, used to combine the submodels, is generalized to more accurately reflect the fault-handling effectiveness within the fault-occurrence model. However, due to approximations made in the aggregation of the two submodels and inaccurate estimation of component failure rates and other model parameters, errors can still arise in the subsequent reliability predictions. The accuracy of the model predictions is evaluated analytically, and error bounds on the system reliability are produced. These modeling techniques have been implemented in the HARP (Hybrid Automated Reliability Predictor) program. © 1985.

Duke Scholars

Published In

Computers and Electrical Engineering

DOI

ISSN

0045-7906

Publication Date

January 1, 1984

Volume

11

Issue

2-3

Start / End Page

87 / 108

Related Subject Headings

  • Electrical & Electronic Engineering
  • 4606 Distributed computing and systems software
  • 4602 Artificial intelligence
  • 4008 Electrical engineering
  • 0906 Electrical and Electronic Engineering
  • 0805 Distributed Computing
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Trivedi, K., Dugan, J. B., Geist, R., & Smotherman, M. (1984). Hybrid reliability modeling of fault-tolerant computer systems. Computers and Electrical Engineering, 11(2–3), 87–108. https://doi.org/10.1016/0045-7906(84)90004-1
Trivedi, K., J. B. Dugan, R. Geist, and M. Smotherman. “Hybrid reliability modeling of fault-tolerant computer systems.” Computers and Electrical Engineering 11, no. 2–3 (January 1, 1984): 87–108. https://doi.org/10.1016/0045-7906(84)90004-1.
Trivedi K, Dugan JB, Geist R, Smotherman M. Hybrid reliability modeling of fault-tolerant computer systems. Computers and Electrical Engineering. 1984 Jan 1;11(2–3):87–108.
Trivedi, K., et al. “Hybrid reliability modeling of fault-tolerant computer systems.” Computers and Electrical Engineering, vol. 11, no. 2–3, Jan. 1984, pp. 87–108. Scopus, doi:10.1016/0045-7906(84)90004-1.
Trivedi K, Dugan JB, Geist R, Smotherman M. Hybrid reliability modeling of fault-tolerant computer systems. Computers and Electrical Engineering. 1984 Jan 1;11(2–3):87–108.
Journal cover image

Published In

Computers and Electrical Engineering

DOI

ISSN

0045-7906

Publication Date

January 1, 1984

Volume

11

Issue

2-3

Start / End Page

87 / 108

Related Subject Headings

  • Electrical & Electronic Engineering
  • 4606 Distributed computing and systems software
  • 4602 Artificial intelligence
  • 4008 Electrical engineering
  • 0906 Electrical and Electronic Engineering
  • 0805 Distributed Computing
  • 0803 Computer Software