Computing the number of calls dropped due to failures
Defects per million (DPM), defined as the number of calls out of a million dropped due to failures, is an important service (un)reliability measure for telecommunication systems. Most previous research derives the DPM from steady-state system availability model. In this paper, we develop a novel method for DPM computation which takes into consideration not only system availability, but also the impact of service application as well as the transient behavior of failure recovery. We illustrate this approach using a real system which is the IBM SIP SLEE cluster. Our method takes into account software/hardware failures, different stages of recovery, different phases of call flow, retry attempts and the interactions between call flow and failure/recovery behavior. © 2010 IEEE.