Hybrid reliability modeling of fault-tolerant computer systems
Current technology allows sufficient redundancy in fault-tolerant computer systems to insure that the failure probability due to exhaustion of spares is low. Consequently, the major cause of failure is the inability to correctly detect, isolate, and reconfigure when faults are present. Reliability estimation tools must be flexible enough to accurately model this critical fault-handling behavior and yet remain computationally tractable. This paper discusses reliability modeling techniques based on a behavioral decomposition that provides tractability by separating the reliability model along temporal lines into nearly disjoint fault-occurrence and fault-handling submodels. An Extended Stochastic Petri Net (ESPN) model provides the needed flexibility for representing the fault-handling behavior, while a nonhomogeneous Markov chain accounts for the possibly non-Poisson fault-occurrence behavior. Since the submodels are separate, the ESPN submodel, in which all time constants are of the same order of magnitude, can be simulated. The nonhomogeneous Markov chain is solved analytically, and the result is a hybrid model. The method of coverage factors, used to combine the submodels, is generalized to more accurately reflect the fault-handling effectiveness within the fault-occurrence model. However, due to approximations made in the aggregation of the two submodels and inaccurate estimation of component failure rates and other model parameters, errors can still arise in the subsequent reliability predictions. The accuracy of the model predictions is evaluated analytically, and error bounds on the system reliability are produced. These modeling techniques have been implemented in the HARP (Hybrid Automated Reliability Predictor) program. © 1985.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Electrical & Electronic Engineering
- 4606 Distributed computing and systems software
- 4602 Artificial intelligence
- 4008 Electrical engineering
- 0906 Electrical and Electronic Engineering
- 0805 Distributed Computing
- 0803 Computer Software
Citation
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Electrical & Electronic Engineering
- 4606 Distributed computing and systems software
- 4602 Artificial intelligence
- 4008 Electrical engineering
- 0906 Electrical and Electronic Engineering
- 0805 Distributed Computing
- 0803 Computer Software