Computing defects per million in cloud caused by virtual machine failures with replication
Virtual machines (VM) are used in cloud computing systems to handle user requests for service. A typical user request goes through several cloud service provider specific processing steps from the instant it is submitted until the service is completed. In the process of providing the service, VM failures cause the user's request to be dropped. To mitigate the adverse impact of VM failure, replication mechanisms, either using cold, warm or hot replication, can be used. In this paper, we model the system behavior with a structure-state process to characterize the failure-recovery behavior of a VM in a cloud that uses one of the aforementioned replication schemes. We use a service-oriented dependability metric called Defects Per Million (DPM), defined as the number of user requests dropped out of a million. The structure-state process approach is used to analyze the job completion time distribution and subsequently we compute the DPM by counting the number of requests exceed the specified deadline. The effectiveness of replication schemes are demonstrated through numerical results.