Sensitivity analysis of reliability and performability measures for multiprocessor systems
Traditional evaluation techniques for multiprocessor systems use Markov chains and Markov reward models to compute measures such as mean time to failure, reliability, performance, and performability. In this paper, the authors discuss the extension of Markov models to include parametric sensitivity analysis. Using such analysis, they can guide system optimization, identify parts of a system model sensitive to error, and find system reliability and performability bottlenecks. As an example they consider three models of a 16 processor, 16 memory system. A network provides communication between the processors and the memories. Two cross-bar-network models and the Omega network are considered. For these models, they examine the sensitivity of the mean time to failure, unreliability, and performability to changes in component failure rates. They use the sensitivities to identify bottlenecks in the three system models