Skip to main content

Kishor S. Trivedi

Hudson Distinguished Professor of Electrical and Computer Engineering
Electrical and Computer Engineering
Box 90291, Durham, NC 27708-0291
534 Research Drive, 401 Wilkinson, Durham, NC 27708-0291

Selected Publications


Cross-project concurrency bug prediction using domain-adversarial neural network

Journal Article Journal of Systems and Software · August 1, 2024 In recent years, software bug prediction has shown to be effective in narrowing down the potential bug modules and boosting the efficiency and precision of existing testing and analysis tools. However, due to its non-deterministic nature and low presence, ... Full text Cite

Reliability and Availability Analysis in Practice: Toward Multilevel Models for Complex Systems

Journal Article Computer · April 1, 2024 This article discusses model-driven methods with analytic-numeric solutions. In addition to traditional non-state-space and state-space methods, multilevel methods are explored using real case studies. Challenges met while developing and solving dependabil ... Full text Cite

Reliability and Availability Assessment

Conference IEEE Transactions on Reliability · March 1, 2024 Given heavy dependence on man-made systems in our daily lives, reliability and availability of these systems clearly gain great importance. Together with methods of enhancing reliability and availability of systems, methods of quantitative assessment of th ... Full text Cite

Rethinking Software Fault Tolerance

Journal Article IEEE Transactions on Reliability · March 1, 2024 Traditional software fault tolerance makes use of design-diversity-based redundancy. While proven to be effective, the independent development of multiple versions of a program or component is connected with high costs. This article shows that failures cau ... Full text Cite

Probability models applied to reliability and availability engineering

Chapter · January 1, 2024 Our daily lives are dependent on various technological systems that may be mission-critical, safety-critical, or business-critical. Reliability and availability are crucial attributes and, thus, key requirements that should be considered during the entire ... Full text Cite

Understanding NFV-Enabled Vehicle Platooning Application: A Dependability View

Journal Article IEEE Transactions on Cloud Computing · October 1, 2023 This paper aims to use analytical modeling technique to quantitatively study the dependability of Vehicle Platooning Application, which consists of Multiple Sub-Services (VPP-MSS) to achieve its functionality. Each sub-service (SS), based on network functi ... Full text Cite

Impact of Service Function Aging on the Dependability for MEC Service Function Chain

Journal Article IEEE Transactions on Dependable and Secure Computing · July 1, 2023 The Multi-access Edge Computing (MEC) and Network Function Virtualization (NFV) integrated architecture is a key enabling platform for 5G to run multiple customized services in the form of service function chain (SFC) configured as an ordered set of servic ... Full text Cite

Model-Driven Dependability Assessment of Microservice Chains in MEC-Enabled IoT

Journal Article IEEE Transactions on Services Computing · July 1, 2023 Multi-access edge computing (MEC)-enabled Internet of Things (IoT) is considered as a promising paradigm to deliver computation-intensive and delay-sensitive services to users. IoT service requests can be served by multiple microservices (MSs) that form a ... Full text Cite

Editorial: Software Reliability and Dependability Engineering

Journal Article IEEE Transactions on Dependable and Secure Computing · July 1, 2023 As software plays an increasingly important role in our lives, it is essential to maintain its reliability, and generally dependability. Software bugs can cause huge financial losses and dangerous accidents; the safety risks from software are underscored t ... Full text Cite

Guest Editorial Special Section on Applied Software Aging and Rejuvenation

Journal Article IEEE Transactions on Emerging Topics in Computing · July 1, 2023 Full text Cite

Towards UAV-Based MEC Service Chain Resilience Evaluation: A Quantitative Modeling Approach

Journal Article IEEE Transactions on Vehicular Technology · April 1, 2023 Unmanned aerial vehicle (UAV) and network function virtualization (NFV) facilitate the deployment of multi-access edge computing (MEC). In the UAV-based MEC (UMEC) network, virtualized network function (VNF) can be implemented as a lightweight container ru ... Full text Cite

Effect of Epistemic Uncertainty in Markovian Reliability Models

Chapter · January 1, 2023 This chapter introduces the moment-based epistemic uncertainty propagation in Markov models. The epistemic uncertainty in Markov models introduces the uncertainty of model parameters, and it can be propagated by regarding parameters as random variables. Th ... Full text Cite

DeepSIM: Deep Semantic Information-Based Automatic Mandelbug Classification

Journal Article IEEE Transactions on Reliability · December 1, 2022 Understanding and predicting types of bugs are of practical importance for developers to improve the testing efficiency and take appropriate steps to address bugs in software releases. However, due to the complex conditions under which faults manifest and ... Full text Cite

Quantitative understanding serial-parallel hybrid sfc services: a dependability perspective

Journal Article Peer-to-Peer Networking and Applications · July 1, 2022 Network function virtualization (NFV) has been explored to be integrated with multi-access edge computing (MEC) to facilitate the development of 5G (fifth-generation) network. Latency-sensitive applications can be deployed as serial-parallel hybrid service ... Full text Cite

Aging, Fast and Slow

Journal Article Computer · May 1, 2022 Software can show symptoms of two different types of aging. Sometimes, it is even subject to both types. ... Full text Cite

Job Completion Time Under Migration-Based Dynamic Platform Technique

Journal Article IEEE Transactions on Services Computing · January 1, 2022 Migration-based Dynamic Platform (MDP) technique, a type of Moving Target Defense (MTD) techniques, defends against sophisticated cyber-attacks by randomly and dynamically selecting a platform for executing service/job. Security defense mechanisms protect ... Full text Cite

Service Availability Analysis in a Virtualized System: A Markov Regenerative Model Approach

Journal Article IEEE Transactions on Cloud Computing · January 1, 2022 With the rapid and wide development and deployment of system virtualization, service availability analysis has become increasingly important in a virtualized system (VS) which suffers from software aging. Software rejuvenation techniques can be applied to ... Full text Cite

Availability Analysis of Systems Deploying Sequences of Environmental-Diversity-Based Recovery Methods

Journal Article IEEE Transactions on Reliability · September 1, 2021 Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before ... Full text Cite

SINR-Based Analysis of IEEE 802.11p/bd Broadcast VANETs for Safety Services

Journal Article IEEE Transactions on Network and Service Management · September 1, 2021 The safety-critical applications of vehicular ad hoc networks (VANETs) require high reliability and low transmission latency. IEEE 802.11p and IEEE 802.11bd are two standards proposed for such vehicular communication systems. In this paper, we propose an e ... Full text Cite

Resilience-Driven Quantitative Analysis of Vehicle Platooning Service

Journal Article IEEE Transactions on Vehicular Technology · June 1, 2021 Vehicle platooning can be applied to cooperative downloading and uploading (CDU) services through the cooperation between lead vehicle and non-lead vehicles. CDU service can be completed cooperatively by containers constructed in vehicles of the vehicle pl ... Full text Cite

ARES: A Framework for Management of Aging and Rejuvenation in Softwarized Networks

Journal Article IEEE Transactions on Network and Service Management · June 1, 2021 The recent trend of network softwarization suggests a radical shift in the implementation of traditional network intelligence. In Software Defined Networking (SDN), for instance, the control plane functions of forwarding devices are outsourced to the contr ... Full text Cite

Quantitative Security Evaluation of Intrusion Tolerant Systems with Markovian Arrivals

Journal Article IEEE Transactions on Reliability · June 1, 2021 Intrusion tolerance is an ability to keep the correct service by masking the intrusion based on fault-tolerant techniques. With the rapid development of virtualization, the virtual machine (VM)-based intrusion tolerance scheme has been developed according ... Full text Cite

S-ADA: Software as an Autonomous, Dependable and Affordable System

Conference Proceedings - 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2021 · June 1, 2021 ADA is a popular programming language that was named after Lady Ada Lovelace (1815-1852) and recommended by Department of Defense, USA, for development of large scale safety-critical software systems. In this Fast Abstract, ADA is reinterpreted as Autonomo ... Full text Cite

Modeling and Evaluation of Multi-Hop Wireless Networks Using SRNs

Journal Article IEEE Transactions on Network Science and Engineering · January 1, 2021 As multi-hop wireless networks are attracting more attention, the need to evaluate their performance becomes essential. In order to evaluate the performance metrics of multi-hop wireless networks, including sending and receiving rates of a node as well as ... Full text Cite

Transient Security and Dependability Analysis of MEC Micro Datacenter under Attack

Conference Proceedings - Annual Reliability and Maintainability Symposium · January 1, 2021 A Multi-access Edge Computing (MEC) micro data center (MEDC) consists of multiple MEC hosts close to endpoint devices. MEC service is delivered by instantiating a virtualization system (e.g., Virtual Machines or Containers) on a MEC host. MEDC faces more n ... Full text Cite

A Multisite Characterization Study on Failure Causes in System and Applications Software

Conference Brazilian Symposium on Computing System Engineering, SBESC · January 1, 2021 A fundamental aspect of software reliability engineering is to understand how software failures manifest, identifying and comprehending their causes and effects. In this paper, we perform ex-post analyses of field software failure data, looking to characte ... Full text Cite

A Statistical Approach to Predict Operating System Failures Based on Multiple Failures Association

Conference Brazilian Symposium on Computing System Engineering, SBESC · November 24, 2020 Empirical studies have shown robust evidence of OS failure patterns characterized by multiple combinations of failure events composed of the same or different failure types. In this paper, we present a statistical approach to predict OS failures based on m ... Full text Cite

Reliability and availability analysis in practice

Chapter · November 16, 2020 Reliability and availability are key attributes of technical systems. Methods of quantifying these attributes are thus essential during all phases of system lifecycle. Data (measurement)-driven methods are suitable for components or subsystems but, for the ... Full text Cite

Chapter 1: Software Aging and Rejuvenation: A Genesis - Extended Abstract

Conference Proceedings - 2020 IEEE 31st International Symposium on Software Reliability Engineering Workshops, ISSREW 2020 · October 1, 2020 This talk summarizes the genesis of software aging and rejuvenation as presented in the handbook of software aging and rejuvenation. It also lays out possible future directions to reflect the content of the concluding chapter of the handbook. ... Full text Cite

DASON: Dependability Assessment Framework for Imperfect Distributed SDN Implementations

Journal Article IEEE Transactions on Network and Service Management · June 1, 2020 In Software Defined Networking (SDN), network programmability is enabled through a logically centralized control plane. Production networks deploy multiple controllers for scalability and reliability reasons, which in turn rely on distributed consensus pro ... Full text Cite

Stress Testing with Influencing Factors to Accelerate Data Race Software Failures

Journal Article IEEE Transactions on Reliability · March 1, 2020 Software failures caused by data race bugs have always been major concerns in parallel and distributed systems, despite significant efforts spent in software testing. Due to their nondeterministic and hard-to-reproduce features, when evaluating systems' op ... Full text Cite

Markov Regenerative Models of WebServers for Their User-Perceived Availability and Bottlenecks

Journal Article IEEE Transactions on Dependable and Secure Computing · January 1, 2020 The Internet world is moving toward a scenario where users and applications have very diverse service expectation, making the current best-effort model inadequate and limiting. To be able to design high-availability service systems, it is essential to cons ... Full text Cite

Analytical modeling of performance indices under epistemic uncertainty applied to cloud computing systems

Journal Article Future Generation Computer Systems · January 1, 2020 The extent of epistemic uncertainty in modeling and analysis of complex systems is ever growing, mainly due to increasing levels of the openness, heterogeneity and versatility in cloud-based applications that are being adopted in critical sectors, like ban ... Full text Cite

Analyzing Software Rejuvenation Techniques in a Virtualized System: Service Provider and User Views

Journal Article IEEE Access · January 1, 2020 Virtualization technology has promoted the fast development and deployment of cloud computing, and is now becoming an enabler of Internet of Everything. Virtual machine monitor (VMM), playing a critical role in a virtualized system, is software and hence i ... Full text Cite

Handbook of software aging and rejuvenation: Fundamentals, methods, applications, and future directions

Book · January 1, 2020 The Handbook of Software Aging and Rejuvenation provides a comprehensive overview of the subject, making it indispensable to graduate students as well as professionals in the field. It begins by introducing fundamental concepts, definitions, and the histor ... Full text Cite

Future directions for software aging and rejuvenation research

Chapter · January 1, 2020 In this chapter we present a summary of some future directions for software aging and rejuvenation research. ... Full text Cite

Preface

Book · January 1, 2020 Cite

Software aging and rejuvenation: A genesis

Chapter · January 1, 2020 Software aging and rejuvenation originated at AT&T Bell Labs. A significant amount of research on the topic also took place at Duke University and Università degli Studi di Napoli Federico II. We present here a historical perspective on this topic as viewe ... Full text Cite

An Empirical Study of Fault Triggers in the Linux Operating System: An Evolutionary Perspective

Journal Article IEEE Transactions on Reliability · December 1, 2019 This paper presents an empirical study of 5741 bug reports for the Linux kernel from an evolutionary perspective, with the aim of obtaining a deep understanding of bug characteristics in the Linux operating system. Bug classification is performed based on ... Full text Cite

An Empirical Exploratory Analysis of Failure Sequences in a Commodity Operating System

Conference Brazilian Symposium on Computing System Engineering, SBESC · November 1, 2019 A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this paper, we present an exploratory study on multiple-even ... Full text Cite

Hierarchical Stochastic Models for Performance, Availability, and Power Consumption Analysis of IaaS Clouds

Journal Article IEEE Transactions on Cloud Computing · October 1, 2019 Infrastructure as a Service (IaaS) is one of the most significant and fastest growing fields in cloud computing. To efficiently use the resources of an IaaS cloud, several important factors such as performance, availability, and power consumption need to b ... Full text Cite

Supervised Representation Learning Approach for Cross-Project Aging-Related Bug Prediction

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · October 1, 2019 Software aging, which is caused by Aging-Related Bugs (ARBs), tends to occur in long-running systems and may lead to performance degradation and increasing failure rate during software execution. ARB prediction can help developers discover and remove ARBs, ... Full text Cite

Rejuvenation and the age of information

Conference Proceedings - 2019 IEEE 30th International Symposium on Software Reliability Engineering Workshops, ISSREW 2019 · October 1, 2019 Two decades after the seminal paper on software aging and rejuvenation appeared in 1995, a new concept and metric referred to as the age of information (AoI) has been gaining attention from practitioners and the research community. In this vision paper, ou ... Full text Cite

Studying Aging-Related Bug Prediction Using Cross-Project Models

Journal Article IEEE Transactions on Reliability · September 1, 2019 In long running systems, software tends to encounter performance degradation and increasing failure rate during execution. This phenomenon has been named software aging, which is caused by aging-related bugs (ARBs). Testing resource allocation can be optim ... Full text Cite

Two-level rejuvenation for android smartphones and its optimization

Journal Article IEEE Transactions on Reliability · June 1, 2019 The Android operating system (OS) is a sophisticated man-made system and is the dominant OS in the current smartphone market. Due to the accumulation of errors in the system internal state and the incremental consumption of resources, such as the Dalvik he ... Full text Cite

Performance evaluation of epidemic content retrieval in DTNs with restricted mobility

Journal Article IEEE Transactions on Network and Service Management · June 1, 2019 In some applicable scenarios, such as community patrolling, mobile nodes are restricted to move only in their own communities. Exploiting the meetings of the nodes within the same community and the nodes within the neighboring communities, a delay tolerant ... Full text Cite

2nd Workshop on Education and Practice ofPerformance Engineering: WEPPE'19 Chairs' Welcome

Conference ICPE 2019 - Companion of the 2019 ACM/SPEC International Conference on Performance Engineering · April 4, 2019 Full text Cite

Software Aging and Software Rejuvenation

Conference Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering · April 4, 2019 Full text Cite

Performance Engineering Education

Conference Companion of the 2019 ACM/SPEC International Conference on Performance Engineering · March 27, 2019 Full text Cite

Quantitative security analysis of a dynamic network system under lateral movement-based attacks

Journal Article Reliability Engineering and System Safety · March 1, 2019 Malicious lateral movement-based attacks have become a potential risk for many systems, bringing highly likely threats to critical infrastructures and national security. When launching this kind of attacks, adversaries first compromise a fraction of the ta ... Full text Cite

Systems Modelling: Methodologies and Tools

Chapter · January 1, 2019 Modern systems implement multiple and complex operations to manage the user demand, thereby ensuring adequate quality levels. They are usually made of a collection of interconnected (autonomous) subsystems, with a common goal to be pursued, that are percei ... Full text Cite

Optimal periodic software rejuvenation policies based on interval reliability criteria

Journal Article Reliability Engineering and System Safety · December 1, 2018 Software aging often affects the performance of software systems and may eventually cause them to fail. A complementary approach to handle transient software failures due to the software aging is called software rejuvenation. It is a preventive and proacti ... Full text Cite

Performance modeling of hyperledger fabric (permissioned blockchain network)

Conference NCA 2018 - 2018 IEEE 17th International Symposium on Network Computing and Applications · November 26, 2018 Hyperledger Fabric (HLF) is an open-source implementation of a distributed ledger platform for running smart contracts in a modular architecture. In this paper, we present a performance model of Hyperledger Fabric v1.0+ using Stochastic Reward Nets (SRN). ... Full text Cite

Survivability model for security and dependability analysis of a vulnerable critical system

Conference Proceedings - International Conference on Computer Communications and Networks, ICCCN · October 9, 2018 This paper aims to analyze transient security and dependability of a vulnerable critical system, under vulnerability-related attack and two reactive defense strategies, from a severe vulnerability announcement until the vulnerability is fully removed from ... Full text Cite

Characterizing machines lifecycle in Google data centers

Journal Article Performance Evaluation · October 1, 2018 Due to the increasing need for computational power, the market has shifted towards big centralized data centers. Understanding the nature of the dynamics of these data centers from machine and job/task perspective is critical to design efficient data cente ... Full text Cite

Performability-based workflow scheduling in grids

Journal Article Computer Journal · October 1, 2018 In this paper, the performance of a grid resource is modeled and evaluated using stochastic reward nets (SRNs), wherein the failure–repair behavior of its processors is taken into account. The proposed SRN is used to compute the blocking probability and se ... Full text Cite

Effective modeling approach for iaas data center performance analysis under heterogeneous workload

Journal Article IEEE Transactions on Cloud Computing · October 1, 2018 Heterogeneity prevails not only among physical machines but also among workloads in real IaaS Cloud data centers (CDCs). The heterogeneity makes performance modeling of large and complex IaaS CDCs even more challenging. This paper considers the scenario wh ... Full text Cite

Assessing the Maturity of SDN Controllers with Software Reliability Growth Models

Journal Article IEEE Transactions on Network and Service Management · September 1, 2018 In software defined networking (SDN), critical control plane functions are offloaded to a software entity known as the SDN controller. Today's SDN controllers are complex software systems, owing to heterogeneity of networks and forwarding devices they supp ... Full text Cite

Keynote Paper: Parametric Uncertainty Propagation through Dependability Models

Conference Proceedings - 8th Latin-American Symposium on Dependable Computing, LADC 2018 · July 2, 2018 The uncertainty propagation is to investigate the effect of errors in model input parameters on the system output measure in probability models. In this paper, we present a moment-based approach of the uncertainty propagation of model input parameters. The ... Full text Cite

Robust Prediction Of Treatment Times In Concurrent Patient Care.

Journal Article Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference · July 2018 Outpatient centers comprised of many concurrent clinics increasingly see higher patient volumes. In these centers, decisions to improve clinic flow must account for the high degree of interdependence when critical personnel or equipment is shared between c ... Full text Cite

Model-based sensitivity analysis of IaaS cloud availability

Journal Article Future Generation Computer Systems · June 1, 2018 The increasing shift of various critical services towards Infrastructure-as-a-Service (IaaS) cloud data centers (CDCs) creates a need for analyzing CDCs’ availability, which is affected by various factors including repair policy and system parameters. This ... Full text Cite

Epistemic Uncertainty Propagation in Power Models

Journal Article Electronic Notes in Theoretical Computer Science · May 9, 2018 Data-centers have recently experienced a fast growth in energy demand, mainly due to cloud computing, a paradigm that lets the users access shared computing resources (e.g., servers, storage, etc.). Several techniques have been proposed in order to allevia ... Full text Cite

Performability Modeling for RAID Storage Systems by Markov Regenerative Process

Journal Article IEEE Transactions on Dependable and Secure Computing · January 1, 2018 This paper presents a performability model for RAID storage systems using Markov regenerative process to compare different RAID architectures. While homogeneous Markov models are extensively used for reliability analysis of RAID storage systems, the memory ... Full text Cite

Transient performance analysis of smart grid with dynamic power distribution

Journal Article Information Sciences · January 1, 2018 Transient performance analysis of power distribution network (PDN) after a failure occurrence could facilitate the better design of smart grid. Researchers have proposed analytical models and the numerical solutions to analyze the PDN's transient behaviors ... Full text Cite

Monitoring and mitigating software aging on IBM cloud controller system

Conference Proceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017 · November 14, 2017 As enterprises continue to move their workloads from traditional server-room environments to private cloud-based systems, there is an increasing desire and ability for companies like IBM to centrally monitor the systems on behalf of their customers to proa ... Full text Cite

Understanding the Impacts of Influencing Factors on Time to a DataRace Software Failure

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · November 14, 2017 Datarace is a common problem on shared-memory parallel computers, including multicores. Due to its dependence on the thread scheduling scheme of its execution environment, the time to a datarace failure is usually very long. How to accelerate the occurrenc ... Full text Cite

Experience Report: Fault Triggers in Linux Operating System: From Evolution Perspective

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · November 14, 2017 Linux operating system is a complex system that is prone to suffer failures during usage, and increases difficulties of fixing bugs. Different testing strategies and fault mitigation methods can be developed and applied based on different types of bugs, wh ... Full text Cite

Performance modeling of PBFT consensus process for permissioned blockchain network (hyperledger fabric)

Conference Proceedings of the IEEE Symposium on Reliable Distributed Systems · October 13, 2017 While Blockchain network brings tremendous benefits, there are concerns whether their performance would match up with the mainstream IT systems. This paper aims to investigate whether the consensus process using Practical Byzantine Fault Tolerance (PBFT) c ... Full text Cite

Reliability and Availability Engineering Modeling, Analysis, and Applications

Book · August 3, 2017 This is the ideal self-study guide for students, researchers and practitioners in engineering and computer science. ... Cite

Epistemic uncertainty propagation in a Weibull environment for a two-core system-on-chip

Conference 2017 2nd International Conference on System Reliability and Safety, ICSRS 2017 · July 2, 2017 Epistemic uncertainty analysis accounts for inaccurate input parameters and evaluates how such uncertainty propagates to output measures. In this work we will focus on Weibull distributions, in particular the one related to the reliability of multi-core sy ... Full text Cite

An empirical study of software reliability in SDN controllers

Conference 2017 13th International Conference on Network and Service Management, CNSM 2017 · July 1, 2017 Software Defined Networking (SDN) exposes critical networking decisions, such as traffic routing or enforcement of the critical security policies, to a software entity known as the SDN controller. Controller software, as written by humans, is intrinsically ... Full text Cite

An empirical investigation of fault triggers in android operating system

Conference Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · May 5, 2017 The growing popularity and complexity of Android operating system makes it prone to suffer failures during usage, which increases difficulties of fixing bugs. Different strategies and mitigation methods can be developed and applied based on different types ... Full text Cite

A novel approach for software vulnerability classification

Conference Proceedings - Annual Reliability and Maintainability Symposium · March 29, 2017 Software vulnerability analysis plays a critical role in the prevention and mitigation of software security attacks, and vulnerability classification constitutes a key part of this analysis. This paper proposes a new approach for software vulnerability cla ... Full text Cite

Transient performance & availability modeling in high volume outpatient clinics

Conference Proceedings - Annual Reliability and Maintainability Symposium · March 29, 2017 High volume outpatient clinics such as eye care centers cannot afford excessive delays, especially when due to limited resources, time, or overhead. Modeling tools from reliability & maintainability practice may provide the means to better assess where imp ... Full text Cite

Automated life cycle processing for complex medical imaging devices

Conference Proceedings - Annual Reliability and Maintainability Symposium · March 29, 2017 Medical imaging systems from major modalities such as Magnetic Resonance Imaging or X-Ray Computed Tomography are complex devices subject to various types of maintenance. Medical device companies that develop these systems often monitor and maintain system ... Full text Cite

Availability modeling and analysis of a virtualized system using stochastic reward nets

Conference Proceedings - 2016 16th IEEE International Conference on Computer and Information Technology, CIT 2016, 2016 6th International Symposium on Cloud and Service Computing, IEEE SC2 2016 and 2016 International Symposium on Security and Privacy in Social Networks and Big Data, SocialSec 2016 · March 10, 2017 Availability is one of the key requirements for modern networked system. Availability of a virtualized system can be modelled and analyzed using stochastic models. In our previous work, availability of a virtualized system was modeled using a hierarchical ... Full text Cite

Application-level scheme to enhance VANET event-driven multi-hop safety-related services

Conference 2017 International Conference on Computing, Networking and Communications, ICNC 2017 · March 10, 2017 In this paper, we focus on the design and analysis of channel access in vehicular ad hoc networks (VANETs) for event-driven multi-hop safety services. First, a novel channel access scheme that incorporates an application-level distance (timer)-based rebroa ... Full text Cite

Redundant Eucalyptus Private Clouds: Availability Modeling and Sensitivity Analysis

Journal Article Journal of Grid Computing · March 1, 2017 Cloud computing infrastructures are designed to be accessible anywhere and anytime. This requires various fault tolerance mechanisms for coping with software and hardware failures. Hierarchical modeling approaches are often used to evaluate the availabilit ... Full text Cite

Analytical Model and Performance Evaluation of Long-Term Evolution for Vehicle Safety Services

Journal Article IEEE Transactions on Vehicular Technology · March 1, 2017 In a traffic jam or dense vehicle environment, vehicular ad hoc networks (VANETs) cannot meet the safety requirement due to serious packet collisions. The traditional cellular network solves packet collisions but suffers from long end-to-end delay. Third-G ... Full text Cite

An approach for resiliency quantification of large scale systems

Conference Performance Evaluation Review · March 1, 2017 We quantify the resiliency of large scale systems upon changes encountered beyond the normal system behavior. Formal definitions for resiliency and change are provided together with general steps for resiliency quantification and a set of resiliency metric ... Full text Cite

Efficient computation of the mean time to security failure in cyber physical systems

Conference ValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017 In this paper, we present a computationally efficient technique for calculating the mean time to security failure (MTTSF) of a mobile cyber physical system (CPS). The CPS analyzed here has been comprehensively studied by other authors using stochastic rewa ... Full text Cite

Parametric sensitivity and uncertainty propagation in dependability models

Conference ValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017 Input parameters of dependability models are often not known accurately. Two principal methods of dealing with such parametric uncertainty are: sensitivity analysis and uncertainty propagation. This paper is an initial attempt to link the two approaches. T ... Full text Cite

Resiliency quantification for large scale systems: An IaaS cloud use case

Conference ValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017 We quantify the resiliency of large scale systems upon changes encountered beyond the normal system behavior. General steps for resiliency quantification are shown and resiliency metrics are defined to quantify the effects of changes. The proposed approach ... Full text Cite

Model-Based Survivability Analysis of a Virtualized System

Conference Proceedings - Conference on Local Computer Networks, LCN · December 22, 2016 Transient survivability analysis of a virtualized system (VS) is critical to the wide deployment of cloud services. The existing research of VS availability and/or reliability focused on the steady-state analysis. This paper presents a model and the closed ... Full text Cite

The Relationship between Software Bug Type and Number of Factors Involved in Failures

Conference Proceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering Workshops, ISSREW 2016 · December 16, 2016 Previous studies have defined different types of software bugs based on their complexity and reproducibility. Simple bugs, which involve only direct factors and are often easy to reproduce, have been called 'Bohrbugs', while complex bugs, with at least one ... Full text Cite

Software Aging Detection Based on Differential Analysis: An Experimental Study

Conference Proceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering Workshops, ISSREW 2016 · December 16, 2016 In this study we evaluate the applicability of the differential software analysis approach to detect memory leaks under a real workload. For this purpose, we used three different versions of a widely used software application, where one version was used as ... Full text Cite

Optimization of two-granularity software rejuvenation policy based on the markov regenerative process

Journal Article IEEE Transactions on Reliability · December 1, 2016 Software rejuvenation is a proactive software control technique that is used to improve a computing system performance when it suffers from software aging. In this paper, a two-granularity inspection-based software rejuvenation policy, which works as a clo ... Full text Open Access Cite

Assessing survivability to support power grid investment decisions

Journal Article Reliability Engineering and System Safety · November 1, 2016 The reliability of power grids has been subject of study for the past few decades. Traditionally, detailed models are used to assess how the system behaves after failures. Such models, based on power flow analysis and detailed simulations, yield accurate c ... Full text Cite

DSN 2016 Tutorial: Reliability and Availability Modeling in Practice

Conference Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016 · September 22, 2016 Full text Cite

Reliability and survivability of vehicular ad hoc networks: An analytical approach

Journal Article Reliability Engineering and System Safety · September 1, 2016 Vehicular ad hoc network (VANET) is a technology that facilitates communication between vehicles by creating a 'mobile Internet'. The system aims at ensuring road safety and achieving secured commutation. For this reason, reliability and survivability of t ... Full text Cite

Probability and Statistics with Reliability, Queuing and Computer Science Applications

Book · September 1, 2016 This updated and revised edition of the popular classic relates fundamental concepts in probability and statistics to the computer sciences and engineering. The author uses Markov chains and other statistical tools to illustrate processes in reliability of ... Full text Cite

Analysis methods for performance & availability in critical care medicine

Conference Proceedings - Annual Reliability and Maintainability Symposium · April 5, 2016 Operations of critical care departments in health systems are increasingly reliant on the availability of interoperable medical devices. Many large health care systems have fully transitioned in recent years to uniform electronic health record platforms, i ... Full text Cite

Reliability models of chronic kidney disease

Conference Proceedings - Annual Reliability and Maintainability Symposium · April 5, 2016 With the rise in quantifiable approaches to health care, lessons from reliability modeling provide new avenues for improving patient outcomes. Describing the development of conditions leading to organ system failure provides visceral motivation for quantif ... Full text Cite

Recovery from Software Failures Caused by Mandelbugs

Journal Article IEEE Transactions on Reliability · March 1, 2016 Software failures are still a major concern in mission- and enterprise-critical contexts, despite significant efforts spent in software testing. In fact, while software testing is effective against easily-reproducible bugs (Bohrbugs), it is considerably le ... Full text Cite

A Scalable Optimization Framework for Storage Backup Operations Using Markov Decision Processes

Conference Proceedings - 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing, PRDC 2015 · January 4, 2016 Explosive growth of data generation and increasing reliance of business analysis on massive data make data loss more damaging than ever before. Thus it has also become a critical issue for businesses to protect important data effectively. In a system with ... Full text Cite

Reliability and performance of general two-dimensional broadcast wireless network

Journal Article Performance Evaluation · January 2016 Featured Publication Full text Cite

Survivability analysis of a computer system under an advanced persistent threat attack

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016 Computer systems are potentially targeted by cybercriminals by means of specially crafted malicious software called Advanced Persistent Threats (APTs). As a consequence, any security attribute of the computer system may be compromised: disruption of servic ... Full text Cite

Survivability quantification for networks

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016 Survivability is a critical attribute of modern computer and communication systems. The assessment of survivability is mostly performed in a qualitative manner and thus cannot meet the need for more precise and solid evaluation of service loss or degradati ... Cite

Software Reliability Analysis of NASA Space Flight Software: A Practical Experience.

Conference IEEE International Conference on Software Quality, Reliability and Security : proceedings. IEEE International Conference on Software Quality, Reliability and Security · January 2016 In this paper, we present the software reliability analysis of the flight software of a recently launched space mission. For our analysis, we use the defect reports collected during the flight software development. We find that this software was developed ... Full text Cite

Modeling of VANET for BSM safety messaging at intersections with non-homogeneous node distribution

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016 This paper presents a new analytic model for the performance and reliability of safety-related message broadcast in vehicular ad hoc networks (VANETs) at intersections with non-homogeneous Poisson process (NHPP) for more general road traffic and node distr ... Full text Cite

Largeness Avoidance in Availability Modeling using Hierarchical and Fixed-point Iterative Techniques

Journal Article International Journal of Performability Engineering · December 2, 2015 Cite

Quantification of system survivability

Journal Article Telecommunication Systems · December 1, 2015 Featured Publication Survivability is a concept that describes the capability of a system to achieve timely recovery after the occurrence of undesired events. It is more general and detailed than many terms, such as RTO and RPO, that have a similar goal. Survivability is capab ... Full text Cite

Workshop on Model Based Design for Cyber-Physical Systems (MB4CP)

Conference Proceedings of the International Conference on Dependable Systems and Networks · September 14, 2015 This paper provides a summary of the First International Workshop on Model Based Design for Cyber- Physical Systems (MB4CP 2015) in conjunction with DSN 2015 conference in Rio de Janeiro, Brazil. ... Full text Cite

Emulating environment-dependent software faults

Conference Proceedings - 1st International Workshop on Complex Faults and Failures in Large Software Systems, COUFLESS 2015 · August 5, 2015 The interaction of software with its execution environment is an underestimated cause of complex faults activation and systems failure. This paper discusses a possible framework to emulate anomalous environment conditions in order to assess the impact of t ... Full text Cite

Survivability as a generalization of recovery

Conference 2015 11th International Conference on the Design of Reliable Communication Networks, DRCN 2015 · July 2, 2015 Social infrastructure systems such as communication, transportation, power and water supply systems are now facing various types of threats including component failures, security attacks and natural disasters, etc. Whenever such undesirable events occur, i ... Full text Cite

Markov chain models and applications

Chapter · April 22, 2015 Modeling is a fundamental aspect of the design process of a complex system, as it allows the designer to compare different architectural choices as well as predict the behavior of the system under varying input traffic, service, fault and prevention parame ... Full text Cite

Future research directions in design of reliable communication systems

Journal Article Telecommunication Systems · March 27, 2015 Featured Publication In this position paper on reliable networks, we discuss new trends in the design of reliable communication systems. We focus on a wide range of research directions including protection against software failures as well as failures of communication systems ... Full text Cite

Performability evaluation of grid environments using stochastic reward nets

Journal Article IEEE Transactions on Dependable and Secure Computing · March 1, 2015 In this paper, performance of grid computing environment is studied in the presence of failure-repair of the resources. To achieve this, in the first step, each of the grid resource is individually modeled using Stochastic Reward Nets (SRNs), and mean resp ... Full text Cite

Defects per million computation in service-oriented environments

Journal Article IEEE Transactions on Services Computing · January 1, 2015 Traditional system-oriented dependability metrics like reliability and availability do not fully reflect the impact of system failure-repair behavior in service-oriented environments. The telecommunication systems community prefers to use Defects Per Milli ... Full text Cite

An SRN-based resiliency quantification approach

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2015 Resiliency is often considered as a synonym for faulttolerance and reliability/availability. We start from a different definition of resiliency as the ability to deliver services when encountering unexpected changes. Semantics of change is of extreme impor ... Full text Cite

On-line algorithms for division and multiplication

Chapter · January 1, 2015 In this paper, on-line algorithms for division and multiplication are developed. It is assumed that the operands as well as the result flow through the arithmetic unit in a digit-by-digit, most significant digit first fashion. The use of a redundant digit ... Full text Cite

Semi-Markov models of composite web services for their performance, reliability and bottlenecks

Journal Article IEEE Transactions on Services Computing · January 1, 2015 © 2015 IEEE. When combining several services into a composite service, it is non-trivial to determine, prior to service deployment, performance and reliability values of the composite service. Moreover, once the service is deployed, it is often the case th ... Full text Cite

Software maintenance optimization based on stackelberg game methods

Conference Proceedings - IEEE 25th International Symposium on Software Reliability Engineering Workshops, ISSREW 2014 · December 12, 2014 Application servers (AS) of virtualized platform may suffer from software aging problem. In this paper, we first formulate the system model including three virtual machines. Two of them act as the main servers, and the third machine acts as the backup node ... Full text Cite

Performability comparison of lustre and HDFS for MR applications

Journal Article Proceedings - IEEE 25th International Symposium on Software Reliability Engineering Workshops, ISSREW 2014 · December 12, 2014 With its simple principles to achieve parallelism and fault tolerance, the Map-reduce framework has captured wide attention, from traditional high performance computing to marketing organizations. The most popular open source implementation of this framewo ... Full text Cite

Reproducibility of environment-dependent software failures: An experience report

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 11, 2014 We investigate the dependence of software failure reproducibility on the environment in which the software is executed. The existence of such dependence is ascertained in literature, but so far it is not fully characterized. In this paper we pinpoint some ... Full text Cite

Computing defects per million in cloud caused by virtual machine failures with replication

Conference Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · December 3, 2014 Virtual machines (VM) are used in cloud computing systems to handle user requests for service. A typical user request goes through several cloud service provider specific processing steps from the instant it is submitted until the service is completed. In ... Full text Cite

Performance and reliability evaluation of BSM broadcasting in DSRC with multi-channel schemes

Journal Article IEEE Transactions on Computers · December 1, 2014 IEEE 1609.4 protocol defines a channel switching mechanism to enable a single radio operating efficiently on multiple channels to support both safety and non-safety services. Basic safety message (BSM) is transmitted only through the control channel at reg ... Full text Cite

Fast computation of bounds for two-terminal network reliability

Journal Article European Journal of Operational Research · November 1, 2014 In this paper, an algorithm for the fast computation of network reliability bounds is proposed. The evaluation of the network reliability is an intractable problem for very large networks, and hence approximate solutions based on reliability bounds have as ... Full text Cite

A markov decision process approach for optimal data backup scheduling

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 18, 2014 The explosive growth of data generation and increasing reliance of business analysis on massive data make data loss more damaging than ever before. Nowadays many organizations start relying on cloud services for keeping their valuable data. It is a critica ... Full text Cite

Analysis of propagation dynamics in complex dynamical network based on disturbance propagation model

Journal Article International Journal of Modern Physics B · September 10, 2014 The paper regards the complex dynamical network (CDN) as a static network with temporal characteristics so as to consider its dynamic behavior. The influence factor and dynamics laws in CDN are explored by using the methods of simulation and statistical ph ... Full text Cite

Message from the chairs

Conference 3rd International Workshop on Software Engineering Challenges for the Smart Grid, SE4SG 2014 - Proceedings · June 1, 2014 Cite

Defects per Million (DPM): A user-oriented perspective of telecommunication systems

Conference 2014 IEEE Globecom Workshops, GC Wkshps 2014 · March 18, 2014 Defects Per Million (DPM), defined as the number of calls dropped out of a million calls due to failures, is used by the telecommunication systems community as a user-perceived dependability metric. As new standards evolve, with built-in mechanisms to hand ... Full text Cite

Fast computation of bounds for two-terminal network reliability

Journal Article European Journal of Operational Research · 2014 Cite

Performance and availability modeling of IT systems with data backup and restore

Journal Article IEEE Transactions on Dependable and Secure Computing · January 1, 2014 In modern IT systems, data backup and restore operations are essential for providing protection against data loss from both natural and man-made incidents. On the other hand, data backup and restore operations can be resource-intensive and lead to performa ... Full text Cite

Stochastic model driven capacity planning for an infrastructure-as-a-service cloud

Journal Article IEEE Transactions on Services Computing · January 1, 2014 From an enterprise perspective, one key motivation to transform the traditional IT management into Cloud is the cost reduction of the hosted services. In an Infrastructure-as-a-Service (IaaS) Cloud, virtual machine (VM) instances share the physical machine ... Full text Cite

Scalable analytics for IaaS cloud availability

Journal Article IEEE Transactions on Cloud Computing · January 1, 2014 In a large Infrastructure-as-a-Service (IaaS) cloud, component failures are quite common. Such failures may lead to occasional system downtime and eventual violation of Service Level Agreements (SLAs) on the cloud service availability. The availability ana ... Full text Cite

A systematic differential analysis for fast and robust detection of software aging

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2014 Software systems running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a c ... Full text Cite

Message from the chairs

Conference 3rd International Workshop on Software Engineering Challenges for the Smart Grid, SE4SG 2014 - Proceedings · January 1, 2014 Cite

A Markov Decision Process Approach for Optimal Data Backup Scheduling

Conference 2014 44TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN) · January 1, 2014 Full text Link to item Cite

Performance analysis for large IaaS clouds

Chapter · January 1, 2014 IaaS clouds are major enablers of data-intensive cloud applications because they provide necessary computing capacity for managing Big Data environments. In a typical IaaS cloud, virtual machine (VM) instances deployed on physical machines (PM) are provide ... Full text Cite

Foreword

Journal Article Ecosystem Services in Agricultural and Urban Landscapes · January 20, 2013 Full text Cite

Performance and Reliability Analysis of Computer Systems An Example-Based Approach Using the SHARPE Software Package

Book · December 6, 2012 In structuring the book, the authors have been careful to provide the reader with a methodological approach to analytical modeling techniques. ... Cite

Multi-state availability modeling in practice

Chapter · December 1, 2012 This chapter presents multi-state availability modeling in practice. We use three analytic modeling techniques; (1) continuous time Markov chains, (2) stochastic reward nets, and (3) multi-state fault trees. Two case studies are presented to show the usage ... Full text Cite

Combining cloud and sensors in a smart city environment

Journal Article Eurasip Journal on Wireless Communications and Networking · December 1, 2012 In the current worldwide ICT scenario, a constantly growing number of ever more powerful devices (smartphones, sensors, household appliances, RFID devices, etc.) join the Internet, significantly impacting the global traffic volume (data sharing, voice, mul ... Full text Cite

The nature of the times to flight software failure during space missions

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2012 The growing complexity of mission-critical space mission software makes it prone to suffer failures during operations. The success of space missions depends on the ability of the systems to deal with software failures, or to avoid them in the first place. ... Full text Cite

Dynamic aspects and behaviors of complex systems in performance and reliability assessment

Journal Article ACM SIGMETRICS Performance Evaluation Review · March 9, 2012 Reliability and performance evaluation are important, often mandatory, steps in designing and analyzing (critical) systems. In such cases, accurate models are required to adequately take into account interference or dependent behaviors affecting th ... Full text Cite

A robust broadcast scheme for VANET one-hop emergency services

Journal Article IEEE Vehicular Technology Conference · December 23, 2011 IEEE- and ASTM-adopted Dedicated Short Range Communications (DSRC) vehicle safety-related communication services, which require reliable and fast message delivery, usually demand broadcast communications in vehicular ad hoc networks (VANETs). In this paper ... Full text Cite

Candy: Component-based availability modeling framework for cloud service management using SysML

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · December 14, 2011 High-availability assurance of cloud service is a critical and challenging issue for cloud service providers. To quantify the availability of cloud services from both architectural and operational points of views, availability modeling and evaluation are e ... Full text Cite

Performance evaluation for DSRC vehicular safety communication: A semi-Markov process approach

Journal Article CTRQ 2011 - 4th International Conference on Communication Theory, Reliability, and Quality of Service · December 1, 2011 In this paper, an analytic model is proposed for the performance evaluation of vehicular safety related services in the dedicated short range communications (DSRC) system on highways. The generation and service of safety messages in each vehicle is modeled ... Cite

Sensitivity analysis of availability of redundancy in computer networks

Journal Article CTRQ 2011 - 4th International Conference on Communication Theory, Reliability, and Quality of Service · December 1, 2011 In this paper, we investigate the availability modeling of computer networks with redundancy mechanisms. Sensitivity analysis is applied in order to find the bottlenecks of system availability. We use Markov chains for the analytical evaluation of complex ... Cite

Performance modeling of apache web server affected by aging

Journal Article Proceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011 A number of studies have reported the phenomenon of "software aging", characterized by progressive software performance degradation. Response time (RT) as a customer-affecting metric can be used to detect the onset of software aging. Alberto Avritzer et al ... Full text Cite

Recovery from failures due to Mandelbugs in IT systems

Journal Article Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · December 1, 2011 Several studies have been carried out on software bugs analysis and classification for life and mission critical systems, which include reproducible bugs called Bohrbugs, and hard to reproduce bugs called Mandelbugs. Although software reliability in IT sys ... Full text Cite

Software rejuvenation in eucalyptus cloud computing infrastructure: A method based on time series forecasting and multiple thresholds

Journal Article Proceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011 The need for reliability and availability has increased in modern applications, in order to handle rapidly growing demands while providing uninterrupted service. Cloud computing systems fundamentally provide access to large pools of data and computational ... Full text Cite

Injecting memory leaks to accelerate software failures

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2011 A number of studies have reported the phenomenon of "Software aging", caused by resource exhaustion and characterized by progressive software performance degradation. We develop experiments that simulate an on-line bookstore application, following the stan ... Full text Cite

A comparative evaluation of software rejuvenation strategies

Journal Article Proceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011 In this paper we present an experimental comparative study of most of the rejuvenation techniques developed so far, divided into two groups: i) simple approaches: physical node reboot (switch off/on), VM reboot, OS reboot and standalone application restart ... Full text Cite

Multi-granularity software rejuvenation policy based on continuous time markov chain

Journal Article Proceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011 In this paper, a multi-granularity software rejuvenation policy isstudied. Four granularities of rejuvenation are proposed tomitigate the impact of four levels of software aging respectively.Continuous Time Markov Chain (CTMC) model is used to obtain theav ... Full text Cite

Uncertainty propagation through software dependability models

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2011 Stochastic models are often employed to study dependability of critical systems and assess various hardware and software fault-tolerance techniques. These models take into account the randomness in the events of interest (aleatory uncertainty) and are gene ... Full text Cite

Job completion time on a virtualized server subject to software aging and rejuvenation

Journal Article Proceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011 Virtual machine monitor (VMM) rejuvenation is a proactive recovery method against failures caused by software aging in VMM. Since the job running on a hosted virtual machine (VM) is interrupted at VMM rejuvenation, the preemption type of VMM rejuvenation i ... Full text Cite

Message from the DYADEM-FTS 2011 workshop organizers

Journal Article Proceedings of the 2011 6th International Conference on Availability, Reliability and Security, ARES 2011 · November 9, 2011 Full text Cite

Modeling and analyzing server system with rejuvenation through SysML and stochastic reward nets

Journal Article Proceedings of the 2011 6th International Conference on Availability, Reliability and Security, ARES 2011 · November 9, 2011 High-availability assurance of server systems is becoming an important issue, since many mission-critical applications are implemented on server systems. To achieve high-availability, software rejuvenation is a practical technique to reduce unexpected down ... Full text Cite

A refined em algorithm for PH distributions

Journal Article Performance Evaluation · October 1, 2011 This paper proposes an improved computation method of maximum likelihood (ML) estimation for phase-type (PH) distributions with a number of phases. We focus on the EM (expectation-maximization) algorithm proposed by Asmussen et al. [27] and refine it in te ... Full text Cite

A hierarchical model to evaluate quality of experience of online services hosted by cloud computing

Journal Article Proceedings of the 12th IFIP/IEEE International Symposium on Integrated Network Management, IM 2011 · September 19, 2011 As online service providers utilize cloud computing to host their services, they are challenged by evaluating the quality of experience and designing redirection strategies in this complicated environment. We propose a hierarchical modeling approach that c ... Full text Cite

Power-performance trade-offs in IaaS cloud: A scalable analytic approach

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 2, 2011 Optimizing for performance is often associated with higher costs in terms of capacity, faster infrastructure, and power costs. In this paper, we quantify the power-performance trade-offs by developing a scalable analytic model for joint analysis of perform ... Full text Cite

Third workshop on proactive failure avoidance, recovery, and maintenance (PFARM)

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 2, 2011 Over the last decade, research on dependable computing has undergone a shift from reactive towards proactive methods: In classical fault tolerance a system reacts to errors or component failures in order to prevent them from turning into system failures, a ... Full text Cite

Third workshop on proactive failure avoidance, recovery, and maintenance (PFARM)

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · August 26, 2011 Over the last decade, research on dependable computing has undergone a shift from reactive towards proactive methods: In classical fault tolerance a system reacts to errors or component failures in order to prevent them from turning into system failures, a ... Full text Cite

A scalable availability model for Infrastructure-as-a-Service cloud

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · August 26, 2011 High availability is one of the key characteristics of Infrastructure-as-a- Service (IaaS) cloud. In this paper, we show a scalable method for availability analysis of large scale IaaS cloud using analytic models. To reduce the complexity of analysis and t ... Full text Cite

A review of the research on quantitative reliability prediction and assessment for electronic components

Journal Article 2011 Prognostics and System Health Management Conference, PHM-Shenzhen 2011 · August 3, 2011 A review is carried out on how quantitative approaches have been applied so far to the Reliability Prediction and Assessment (RPA) for computer and communication systems. A series of the reliability evaluation technology based on analytic models and comput ... Full text Cite

A stochastic model for beaconless IEEE 802.15.4 MAC operation

Journal Article Computer Communications · August 2, 2011 IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ... Full text Cite

Accelerated life tests and software aging

Chapter · June 30, 2011 Accelerated life test (ALT) methods are successfully applied in many industries to reduce the test period of highly dependable products. Software industry is not different, having the same demand to reduce the period of test for software products with very ... Full text Cite

Performance and availability analysis for infrastructure-as-a-service cloud

Journal Article Proceedings of International Conference on Software Engineering: Software Quality: The Road Ahead, CONSEG 2011 · January 1, 2011 In this paper, we describe a Markov chain based approach for the performance and availability analysis of cloud provided services. We use infrastructure-asa-service as an example of a cloud based service, where service availability and provisioning respons ... Cite

Response time distributions in networks of queues

Chapter · January 1, 2011 This chapter addresses the issue of determining the response time distribution in networks of queues. Four different techniques are described and demonstrated. A two step numerical approach to compute the response time distribution for closed Markovian net ... Full text Cite

Guest Editorial: Performance and dependability modeling of dynamic systems

Journal Article International Journal of Performability Engineering · January 1, 2011 Cite

Dynamic aspects and behaviors in system reliability evaluation

Journal Article International Journal of Performability Engineering · January 1, 2011 Reliability is one of the key attributes of dependability and quality of service. Techniques and tools for reliability assessment are therefore required in order to evaluate and to predict system behavior. In many contexts, merely taking into account of st ... Cite

Markov Modeling Approach for survivability analysis of cellular networks

Journal Article International Journal of Performability Engineering · January 1, 2011 Survivability is the capability of a system to fulfill its mission in a timely manner in the presence of failures, attacks and accidents. In this paper, quantitative assessment of survivability of cellular networks is conducted by developing an analytical ... Cite

Uncertainty propagation in analytic availability models

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · December 30, 2010 In this paper, we discuss a Monte Carlo sampling based method for propagating the epistemic uncertainty in model parameters, through the system availability model. We also outline methods to compute the number of samples needed to obtain a desired confiden ... Full text Cite

Quantifying resiliency of IaaS cloud

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · December 30, 2010 Cloud based services may experience changes - internal, external, large, small - at any time. Predicting and quantifying the effects on the quality-of-service during and after a change are important in the resiliency assessment of a cloud based service. In ... Full text Cite

On-line adaptive algorithms in autonomic restart control

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 15, 2010 Restarts or retries are typical control schemes to meet a deadline in real-time systems, and are regarded as significant environmental diversity techniques in dependable computing. This paper reconsiders a restart control studied by van Moorsel and Wolter ... Full text Cite

Computing the number of calls dropped due to failures

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2010 Defects per million (DPM), defined as the number of calls out of a million dropped due to failures, is an important service (un)reliability measure for telecommunication systems. Most previous research derives the DPM from steady-state system availability ... Full text Cite

Using accelerated life tests to estimate time to software aging failure

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2010 Software aging is a phenomenon defined as the continuing degradation of software systems during runtime, being particularly noticeable in long-running applications. Aging-related failures are very difficult to observe, because the accumulation of aging eff ... Full text Cite

A hierarchical model for reliability analysis of sensor networks

Journal Article Proceedings - 16th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2010 · December 1, 2010 Prior to field deployment, mission critical sensor networks should be analyzed for high reliability assurance. Past research only focused on reliability models for sensor node or network in isolation. This paper presents a comprehensive approach for reliab ... Full text Cite

End-to-end performability analysis for Infrastructure-as-a-Service cloud: An interacting stochastic models approach

Journal Article Proceedings - 16th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2010 · December 1, 2010 Handling diverse client demands and managing unexpected failures without degrading performance are two key promises of a cloud delivered service. However, evaluation of a cloud service quality becomes difficult as the scale and complexity of a cloud system ... Full text Cite

Cyber security analysis using attack countermeasure trees

Journal Article ACM International Conference Proceeding Series · November 22, 2010 Attack tree (AT) is one of the widely used combinatorial models in cyber security analysis. The basic formalism of AT does not take into account defense mechanisms. Defense trees (DT) have been developed to investigate the effect of defense mechanisms usin ... Full text Cite

An empirical investigation of fault types in space mission system software

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 20, 2010 As space mission software becomes more complex, the ability to effectively deal with faults is increasingly important. The strategies that can be employed for fighting a software bug depend on its fault type. Bohrbugs are easily isolated and removed during ... Full text Cite

Second workshop on proactive failure avoidance, recovery, and maintenance (PFARM)

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 20, 2010 Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop provided a stimulating, and ... Full text Cite

Second workshop on proactive failure avoidance, recovery, and maintenance (PFARM)

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · September 20, 2010 Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop provided a stimulating, and ... Full text Cite

In Memoriam: Dr. Chandra Kintala

Journal Article Journal of Systems and Software · September 2010 Full text Cite

Message from the organizers

Journal Article ACM International Conference Proceeding Series · July 20, 2010 Cite

Online monitoring of software system reliability

Journal Article EDCC-8 - Proceedings of the 8th European Dependable Computing Conference · July 12, 2010 Reliability is one of the major concerns for software engineers. The increasing size of software systems and their inherent complexity - which is essentially related to the intricate interdependencies among many heterogeneous components - pose serious diff ... Full text Cite

Software reliability and testing time allocation: An architecture-based approach

Journal Article IEEE Transactions on Software Engineering · March 29, 2010 With software systems increasingly being employed in critical contexts, assuring high reliability levels for large, complex systems can incur huge verification costs. Existing standards usually assign predefined risk levels to components in the design phas ... Full text Cite

Software fault mitigation and availability assurance techniques

Journal Article International Journal of System Assurance Engineering and Management · January 1, 2010 Companies are expected to keep their systems up and running and make data continuously available. Several recent studies have established that most system outages are due to software faults. In this paper, we discuss availability aspects of large software- ... Full text Cite

Evaluation of software performance affected by aging

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2010 A number of studies have reported the phenomenon of "Software aging", characterized by progressive software performance degradation. This is mainly caused by the exhaustion of the combination of system resources. Traditionally, modeling and analysis of sof ... Full text Cite

Modeling and analysis of software rejuvenation in a server virtualized system

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2010 As server virtualization is used as an essential software infrastructure of various software services such as cloud computing, availability management of server virtualized system is becoming more significant. Although time-based software rejuvenation is u ... Full text Cite

Performability analysis of multistate computing systems using multivalued decision diagrams

Journal Article IEEE Transactions on Computers · January 1, 2010 A distinct characteristic of multistate systems (MSS) is that the systems and/or their components may exhibit multiple performance levels (or states) varying from perfect operation to complete failure. MSS can model behaviors such as shared loads, performa ... Full text Cite

Accelerated degradation tests applied to software aging experiments

Journal Article IEEE Transactions on Reliability · January 1, 2010 In the past ten years, the software aging phenomenon has been systematically researched, and recognized by both academic, and industry communities as an important obstacle to achieving dependable software systems. One of its main effects is the depletion o ... Full text Cite

Dependability and security models

Journal Article Proceedings of the 2009 7th International Workshop on the Design of Reliable Communication Networks, DRCN 2009 · December 16, 2009 There is a need to quantify system properties methodically. Dependability and security models have evolved nearly independently. Therefore, it is crucial to develop a classification of dependability and security models which can meet the requirement of pro ... Full text Cite

Toward optimal virtual machine placement and rejuvenation scheduling in a virtualized data center

Journal Article 2008 IEEE International Conference on Software Reliability Engineering Workshops, ISSRE Wksp 2008 · December 15, 2009 Virtualization enables data centers to consolidate servers to improve resource utilization and power consumption. This paper presents the issues of performability management in a virtualized data center that hosts multiple services using virtualization. On ... Full text Cite

The fundamentals of software aging

Journal Article 2008 IEEE International Conference on Software Reliability Engineering Workshops, ISSRE Wksp 2008 · December 15, 2009 Since the notion of software aging was introduced thirteen years ago, the interest in this phenomenon has been increasing from both academia and industry. The majority of the research efforts in studying software aging have focused on understanding its eff ... Full text Cite

Availability modeling and analysis of a virtualized system

Journal Article 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009 · December 1, 2009 This paper develops an availability model of a virtualized system. We construct non-virtualized and virtualized two hosts system models using a two-level hierarchical approach in which fault trees are used in the upper level and homogeneous continuous time ... Full text Cite

A stochastic model for beaconless IEEE 802.15.4 MAC operation

Journal Article International Symposium on Performance Evaluation of Computer and Telecommunication Systems 2009, SPECTS 2009, Part of the 2009 Summer Simulation Multiconference, SummerSim 2009 · December 1, 2009 IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ... Cite

Survivability modeling with stochastic reward nets

Journal Article Proceedings - Winter Simulation Conference · December 1, 2009 Critical services in a telecommunication network should survive and be continuously provided even when undesirable events like sabotage, natural disasters, or network failures happen. The network survivability is quantified as defined by the ANSI T1A1.2 co ... Full text Cite

Workshop on Proactive Failure Avoidance, Recovery and Maintenance (PFARM)

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · November 25, 2009 Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop aimed to provide a stimulat ... Full text Cite

A sctochastic model for beaconless IEEE 802.15.4 MAC operation

Journal Article Proceedings of the 2009 International Symposium on Performance Evaluation of Computer and Telecommunication Systems, SPECTS 2009 · November 12, 2009 IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ... Cite

Analyzing the hold time schemes to limit the routing table calculations in OSPF protocol

Journal Article Proceedings - International Conference on Advanced Information Networking and Applications, AINA · October 5, 2009 OSPF is a popular interior gateway routing protocol. Commercial OSPF routers limit their processing load by using a hold time between successive routing table calculations as new link state advertisements (LSAs) arrive following a topology change. A large ... Full text Cite

Modeling user-perceived reliability based on user behavior graphs

Journal Article International Journal of Reliability, Quality and Safety Engineering · August 1, 2009 Service Reliability is an important consideration for new service deployment. Traditional system-oriented measures are no longer adequate to describe the reliability perceived by the user. In this paper we propose a general service reliability analysis app ... Full text Cite

Markovian arrival process parameter estimation with group data

Journal Article IEEE/ACM Transactions on Networking · July 3, 2009 This paper addresses a parameter estimation problem of Markovian arrival process (MAP). In network traffic measurement experiments, one often encounters the group data where arrival times for a group are collected as one bin. Although the group data are ob ... Full text Cite

Network survivability modeling

Journal Article Computer Networks · June 11, 2009 Critical services in a telecommunication network should be continuously provided even when undesirable events like sabotage, natural disasters, or network failures happen. It is essential to provide virtual connections between peering nodes with certain pe ... Full text Cite

SHARPE at the age of twenty two

Journal Article ACM SIGMETRICS Performance Evaluation Review · March 25, 2009 This paper discusses the modeling tool called SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator), a general hierarchical modeling tool that analyzes stochastic models of reliability, availability, performance, and perfor ... Full text Cite

Resilience in computer systems and networks

Journal Article IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD · January 1, 2009 The term resilience is used differently by different communities. In general engineering systems, fast recovery from a degraded system state is often termed as resilience. Computer networking community defines it as the combination of trustworthiness (depe ... Full text Cite

SURVIVABILITY MODELING WITH STOCHASTIC REWARD NETS

Conference PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4 · January 1, 2009 Link to item Cite

Availability Modeling of SIP Protocol on IBM© WebSphere©

Journal Article Proceedings of the 14th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2008 · December 1, 2008 We present the availability model of a high availability SIP Application Server configuration on WebSphere. Hardware, operating system and application server failures are considered. Different types of fault detectors, detection delays, failover delays, re ... Full text Cite

Availability analysis of blade server systems

Journal Article IBM Systems Journal · December 1, 2008 The successful development and marketing of commercial high-availability systems requires the ability to evaluate the availability of systems. Specifically, one should be able to demonstrate that projected customer requirements are met, to identify availab ... Full text Cite

Survivability quantification of real-sized networks including end-to-end delay distributions

Journal Article Proc. - The 3rd Int. Conf. Systems and Networks Communications, ICSNC 2008 - Includes I-CENTRIC 2008: Int. Conf. Advances in Human-Oriented and Personalized Mechanisms, Technologies, and Services · December 1, 2008 In a telecommunication network it is essential to provide virtual connections between peering nodes with performance guarantees such as minimum throughput, maximum delay or loss. Critical services in telecommunication network should be continuously provide ... Full text Cite

Reliable system design: Models, metrics and design techniques

Conference 2008 IEEE/ACM International Conference on Computer-Aided Design · November 2008 Full text Cite

Survivability quantification of communication services

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · October 13, 2008 Our society is heavily dependent on a wide variety of communication services. These services must be available even when undesirable events like sabotage, natural disasters, or network failures happen. The network survivability as defined by the ANSI T1A1. ... Full text Cite

Achieving and assuring high availability

Journal Article IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM · September 10, 2008 We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability assur ... Full text Cite

Achieving and assuring high availability

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · June 11, 2008 We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, and then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability a ... Full text Cite

Ten fallacies of availability and reliability analysis

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · June 9, 2008 As modern society becomes more and more dependent on computers and computer networks, vulnerability and downtime of these systems will significantly impact daily life from both social and economic point of view. Words like reliability and downtime are freq ... Full text Cite

Achieving and assuring high availability

Conference 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 · January 1, 2008 Link to item Cite

Decompositional analysis of Kronecker structured Markov chains

Journal Article Electronic Transactions on Numerical Analysis · January 1, 2008 This contribution proposes a decompositional iterative method with low memory requirements for the steadystate analysis ofKronecker structured Markov chains. The Markovian system is formed by a composition of subsystems using the Kronecker sum operator for ... Cite

Combined guard channel and mobile-assisted handoff for cellular networks

Journal Article IEEE Transactions on Vehicular Technology · January 1, 2008 For cellular communication systems, mobility and limited radio coverage of a cell require calls to be handed over from one base station system (BSS) to another. Due to the limited bandwidth available in various cells, there is a finite probability that an ... Full text Cite

Software Aging and Rejuvenation

Chapter · December 14, 2007 AbstractSeveral recent studies have established that most system outages are due to software faults. Given the ever‐increasing complexity of software and the well‐developed techniques and analysis for hardware reliability, ... Full text Cite

A best practice guide to resource forecasting for computing systems

Journal Article IEEE Transactions on Reliability · December 1, 2007 Recently, measurement-based studies of software systems have proliferated, reflecting an increasingly empirical focus on system availability, reliability, aging, and fault tolerance. However, it is a nontrivial, error-prone, arduous, and time-consuming tas ... Full text Cite

Availability monitor for a software based system

Journal Article Proceedings of IEEE International Symposium on High Assurance Systems Engineering · December 1, 2007 Computer and communication systems are ubiquitous and are used extensively in safety critical, life critical, and finance critical applications. Due to the excessive cost of outages, downtime is not tolerated by the users. High availability applications ar ... Full text Cite

Variational Bayesian approach for interval estimation of NHPP-based software reliability models

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · November 16, 2007 In this paper, we present a variational Bayesian (VB) approach to computing the interval estimates for nonhomogeneous Poisson process (NHPP) software reliability models. This approach is an approximate method that can produce analytically tractable posteri ... Full text Cite

Availability Monitor for a Software Based System

Conference 10th IEEE High Assurance Systems Engineering Symposium (HASE'07) · November 2007 Full text Cite

Accurate and efficient stochastic reliability analysis of composite services using their compact Markov reward model representations

Journal Article Proceedings - 2007 IEEE International Conference on Services Computing, SCC 2007 · October 18, 2007 Stochastic reliability analysis of composite services is challenging, primarily since it needs us to carefully balance accuracy of analysis and its computational complexity: Given stochastic models of service components, we often combine them and define a ... Full text Cite

Reliability analysis of phased-mission system with independent component repairs

Journal Article IEEE Transactions on Reliability · September 1, 2007 This paper proposes a hierarchical modeling approach for the reliability analysis of phased-mission systems with repairable components. The components at the lower level are described by continuous time Markov chains which allow complex component failure/r ... Full text Cite

Performance and reliability of tree-structured grid services considering data dependence and failure correlation

Journal Article IEEE Transactions on Computers · July 1, 2007 Grid computing is a newly emerging technology aimed at large-scale resource sharing and global-area collaboration. It is the next step in the evolution of parallel and distributed computing. Due to the largeness and complexity of the grid system, its perfo ... Full text Cite

Quantifying software performance, reliability and security: An architecture-based approach

Journal Article Journal of Systems and Software · April 1, 2007 With component-based systems becoming popular and handling diverse and critical applications, the need for their thorough evaluation has become very important. In this paper we propose an architecture-based unified hierarchical model for software performan ... Full text Cite

Performability analysis of clustered systems with rejuvenation under varying workload

Journal Article Performance Evaluation · March 1, 2007 This paper develops time-based rejuvenation policies to improve the performability measures of a cluster system. Three rejuvenation policies, namely standard rejuvenation, delayed rejuvenation and mixed rejuvenation, are designed to improve the cluster's p ... Full text Cite

Fighting bugs: Remove, retry, replicate, and rejuvenate

Journal Article Computer · February 1, 2007 Combatting vastly different types of software bugs requires different strategies. ... Full text Cite

Simulation versus analytic-numeric methods: Illustrative examples

Conference VALUETOOLS 2007 - 2nd International ICST Conference on Performance Evaluation Methodologies and Tools · January 1, 2007 Performance along with dependability analysis is a tremendous challenge in the design or improvement of modern complex systems. Two different classes of solution methods are generally used: analytic-numeric methods and simulation methods. As most of the li ... Full text Cite

Stochastic modeling of composite Web services for closed-form analysis of their performance and reliability bottlenecks

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2007 Web services providers often commit service-level agreements (SLAs) with their customers for guaranteeing the quality of the services. These SLAs are related not just to functional attributes of the services but to performance and reliability attributes as ... Full text Cite

Survivability Quantification - Keynote.

Conference BROADNETS · 2007 Cite

Fighting bugs - Response

Journal Article COMPUTER · 2007 Cite

Performance assurance via software rejuvenation: Monitoring, statistics and algorithms

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · December 22, 2006 We present three algorithms for detecting the need for software rejuvenation by monitoring the changing values of a customer-affecting performance metric, such as response time. Applying these algorithms can improve the values of this customer-affecting me ... Full text Cite

Analytical models for architecture-based software reliability prediction: A unification framework

Journal Article IEEE Transactions on Reliability · December 1, 2006 Traditional approaches to software reliability modeling are black box-based; that is, the software system is considered as a whole, and only its interactions with the outside world are modeled without looking into its internal structure. The black box appr ... Full text Cite

Welcome message

Journal Article Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006 · December 1, 2006 Full text Cite

Welcome message

Journal Article ISAHUC' 06 - Proceedings of 2006 International Symposium on Ad Hoc and Ubiquitous Computing · December 1, 2006 Full text Cite

Analytical survivability model for fault tolerant cellular networks supporting multiple services

Journal Article International Symposium on Performance Evaluation of Computer and Telecommunication Systems 2006, SPECTS'06, Part of the 2006 Summer Simulation Multiconference, SummerSim'06 · December 1, 2006 Survivability analysis measure the degree of functionality remaining in a system after failures. It consists of evaluating metrics which quantify the system performance during failure scenarios as well as in normal operation. Existing research work in this ... Cite

Modeling high availability systems

Journal Article Proceedings - 12th Pacific Rim International Symposium on Dependable Computing, PRDC 2006 · December 1, 2006 Carrier grade high availability platforms are designed to enable the development and deployment of highly available services in the telecommunications industry. In order to build-in high availability and compare availabilities that differ in the sixth deci ... Full text Cite

A best practice guide to resource forecasting for the apache webserver

Journal Article Proceedings - 12th Pacific Rim International Symposium on Dependable Computing, PRDC 2006 · December 1, 2006 Recently, measurement based studies of software systems proliferated, reflecting an increasingly empirical focus on system availability, reliability, aging and fault tolerance. However, it is a non-trivial, error-prone, arduous, and time-consuming task eve ... Full text Cite

A performance engineering tool for tiered software systems

Journal Article Proceedings - International Computer Software and Applications Conference · December 1, 2006 Performance engineering is an important activity for software architects and designers. Assessment and tuning of performance can help to make key changes in the system, especially if done early in its development. In this paper, we present a tool for the p ... Full text Cite

Reliability and performance of component based software systems with restarts, retries, reboots and repairs

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2006 High reliability and performance are vital for software systems handling diverse mission critical applications. Such software systems are usually component based and may possess multiple levels of fault recovery. A number of parameters, including the softw ... Full text Cite

Analysis of software aging in a Web server

Journal Article IEEE Transactions on Reliability · September 1, 2006 Several recent studies have reported & examined the phenomenon that long-running software systems show an increasing failure rate and/or a progressive degradation of their performance. Causes of this phenomenon, which has been referred to as "software agin ... Full text Cite

Design and performance analysis of a new soft handoff scheme for CDMA cellular systems

Journal Article IEEE Transactions on Vehicular Technology · September 1, 2006 In this paper, a new soft handoff scheme for CDMA cellular systems is proposed and investigated. It is pointed out that some handoff calls unnecessarily occupy multiple channels with little contribution to the performance of handoffs in IS95/CDMA2000-based ... Full text Cite

Incorporating fault debugging activities into software reliability models: A simulation approach

Journal Article IEEE Transactions on Reliability · June 1, 2006 A large number of software reliability growth models have been proposed to analyse the reliability of a software application based on the failure data collected during the testing phase of the application. To ensure analytical tractability, most of these m ... Full text Cite

Queueing Networks and Markov Chains: Modeling and Performance Evaluation With Computer Science Applications: Second Edition

Book · April 21, 2006 Critically acclaimed text for computer performance analysis--now in its second edition The Second Edition of this now-classic text provides a current and thorough treatment of queueing systems, queueing networks, continuous and discrete-time Markov chains, ... Full text Cite

Modeling and performance analysis for soft handoff schemes in CDMA cellular systems

Journal Article IEEE Transactions on Vehicular Technology · March 1, 2006 This paper investigates the features of a cellular geometry in code-division multiple-access (CDMA) systems with soft handoff and distinguishes controlling area of a cell from coverage area of a cell. Some important characteristics of the cellular configur ... Full text Cite

Survivability quantification: The analytical modeling approach

Journal Article International Journal of Performability Engineering · January 1, 2006 In this paper, we present a general survivability quantification approach that is applicable to a wide range of system architectures, applications, failure/recovery behaviors, and metrics. We show how this approach can be applied to derive survivability me ... Cite

State space approach to security quantification

Journal Article Proceedings - International Computer Software and Applications Conference · December 1, 2005 In this paper, we describe three different state space models for analyzing the security of a software system. In the first part of this paper, we utilize a semi-Markov Process (SMP) to model the transitions between the security states of an abstract softw ... Full text Cite

Modeling and simulation of integrated voice/data cellular communication with generally distributed delay for end voice calls

Journal Article Proceedings - Winter Simulation Conference · December 1, 2005 Cellular networks are gradually shifting from voice only to voice and data due to increased demand for WWW, FTP and multi-media messaging. This has substantially increased the volume of cellular data traffic. Schemes have been proposed for co-existence and ... Full text Cite

On a method for mending time to failure distributions

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · November 9, 2005 Many software reliability growth models assume that the time to next failure may be infinite; i.e., there is a chance that no failure will occur at all. For most software products this is too good to be true even after the testing phase. Moreover, if a non ... Full text Cite

Optimization for condition-based maintenance with semi-Markov decision process

Journal Article Reliability Engineering and System Safety · October 1, 2005 The semi-Markov decision model is a powerful tool in analyzing sequential decision processes with random decision epochs. In this paper, we have built the semi-Markov decision process (SMDP) for the maintenance policy optimization of condition-based preven ... Full text Cite

A workload-based analysis of software aging, and rejuvenation

Journal Article IEEE Transactions on Reliability · September 1, 2005 We present a hierarchical model for the analysis of proactive fault management in the presence of system resource leaks. At the low level of the model hierarchy is a degradation model in which we use a nonhomogeneous Markov chain to establish an explicit c ... Full text Cite

Computing steady-state mean time to failure for non-coherent repairable systems

Journal Article IEEE Transactions on Reliability · September 1, 2005 Mean time to failure (MTTF) is an important reliability measure. Previous research is mainly concerned with the MTTF computation of coherent systems. In this paper, we derive equations to calculate the steady-state MTTF for noncoherent systems. Based on th ... Full text Cite

Analysis of a two-level software rejuvenation policy

Journal Article Reliability Engineering and System Safety · January 1, 2005 A two-level rejuvenation policy for software systems with degradation process is studied. Both full restarts and partial restarts are considered in this rejuvenation strategy. A semi-Markov process model is constructed, and based on its closed-form solutio ... Full text Cite

Truncated non-homogeneous Poisson process models - properties and performance

Journal Article Opsearch (India) · 2005 All non-homogeneous Poisson process (NHPP) software reliability growth models of the finite failures category share the property that every time to failure distribution is defective. The reason for this phenomenon is the fact that according to these models ... Cite

A proactive approach towards always-on availability in broadband cable networks

Journal Article Computer Communications · 2005 In this paper, we propose a high availability design of a Cable Modem Termination System (CMTS) clusters system based on the software rejuvenation technique. This proactive system maintenance technique is aimed to reduce system outages and the associated d ... Full text Link to item Cite

Architecture based analysis of performance, reliability and security of software systems

Journal Article Proceedings of the Fifth International Workshop on Software and Performance, WOSP'05 · January 1, 2005 With software systems becoming more complex, and handling diverse and critical applications, the need for their thorough evaluation has become ever more important at each phase of software development. With the prevalent use of component-based design, the ... Full text Cite

Modeling user-perceived service availability

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2005 Service availability is an important consideration when carriers deploy new, packet-based services. In this paper we define the service availability based on user behavior, and derive formulas to compute service availability starting with the user behavior ... Full text Cite

A comprehensive model for software rejuvenation

Journal Article IEEE Transactions on Dependable and Secure Computing · January 1, 2005 Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaus ... Full text Cite

StackOFFence: A technique for defending against buffer overflow attacks

Journal Article International Conference on Information Technology: Coding and Computing, ITCC · January 1, 2005 Software coding practices, in the interest of efficiency, often ignore to enforce strict bound checking on buffers, arrays and pointers. This results in software code that is more vulnerable to security intrusions exploiting buffer overflow vulnerabilities ... Full text Cite

Evaluating performance attributes of layered software architecture

Journal Article Lecture Notes in Computer Science · January 1, 2005 The architecture of a software system is the highest level of abstraction whereupon useful analysis of system properties is possible. Hence, performance analysis at this level can be useful for assessing whether a proposed architecture can meet the desired ... Full text Cite

Message from the RAMPDS-2005 chairs

Journal Article Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS · January 1, 2005 Full text Cite

Security modeling and quantification of intrusion tolerant systems using attack-response graph

Journal Article Journal of High Speed Networks · December 29, 2004 Increasing deployment of computer systems in critical applications has made study and quantifiable analysis of the security aspects of these systems an important issue. Security quantification analysis can either be done by logging large amounts of operati ... Cite

An analytical approach to architecture-based software performance and reliability prediction

Journal Article Performance Evaluation · December 1, 2004 Conventional approaches to analyze the behavior of software applications are black box based, that is, the software application is treated as a whole and only its interactions with the outside world are modeled. The black box approaches ignore information ... Full text Cite

Survivability analysis of telephone access network

Journal Article Proceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2004 The telecommunications industry has achieved high reliability and availability for telephone service over decades of development. However, the current design does not aim at providing service survivability when a local switching office fails due to catastr ... Cite

An infinite server queueing approach for describing software reliability growth ∼ - Unified modeling and estimation framework

Journal Article Proceedings - Asia-Pacific Software Engineering Conference, APSEC · December 1, 2004 In general, the software reliability models based on the non-homogeneous Poisson processes (NHPPs) are quite popular to assess quantitatively the software reliability and its related dependability measures. Nevertheless, it is not so easy to select the bes ... Full text Cite

Hierarchical computation of interval availability and related metrics

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · October 1, 2004 As the new generation high-availability commercial computer systems incorporate deferred repair service strategies, steady-state availability metrics may no longer reflect reality. Transient solution of availability models for such systems to calculate int ... Cite

Optimal estimation of training interval for channel equalization

Journal Article IEEE Transactions on Wireless Communications · September 1, 2004 In this paper, an optimal training equalization for wireless communication is proposed and analyzed. By our scheme, the training of the equalizer is carried out periodically, with the training interval optimized for a maximal channel utilization. A closed- ... Full text Cite

Software rejuvenation policies for cluster systems under varying workload

Journal Article Proceedings - IEEE Pacific Rim International Symposium on Dependable Computing · June 15, 2004 This paper analyzes two software rejuvenation policies of cluster server systems under varying workload, called fixed rejuvenation and delayed rejuvenation. In order to achieve a higher average throughput, we propose the delayed rejuvenation policy, which ... Cite

The effect of access delay in capacity-on-demand access over a wireless link under bursty packet-switched data

Journal Article Performance Evaluation · May 1, 2004 Capacity-on-demand is the key concept in multiplexing bursty mobile data traffic over wireless links featuring limited bandwidth. This scheme maintains a connection for a mobile only when it has data to transfer and allows quick release of radio resource w ... Full text Cite

A method for modeling and quantifying the security attributes of intrusion tolerant systems

Journal Article Performance Evaluation · March 1, 2004 Complex software and network based information server systems may exhibit failures. Quite often, such failures may not be accidental. Instead some failures may be caused by deliberate security intrusions with the intent ranging from simple mischief, theft ... Full text Cite

Comparing software rejuvenation policies under different dependability measures

Journal Article IEICE Transactions on Information and Systems · January 1, 2004 Software rejuvenation is a preventive and proactive solution that is particularly useful for counteracting the phenomenon of software aging. In this paper, we consider both the periodic and non-periodic software rejuvenation policies under different depend ... Cite

Analysis of software fault removal policies using a non-homogeneous continuous time Markov chain

Journal Article Software Quality Journal · January 1, 2004 Software reliability is an important metric that quantifies the quality of a software product and is inversely related to the residual number of faults in the system. Fault removal is a critical process in achieving desired level of quality before software ... Full text Cite

Model-based evaluation: From dependability to security

Journal Article IEEE Transactions on Dependable and Secure Computing · January 1, 2004 The development of techniques for quantitative, model-based evaluation of computer system dependability has a long and rich history. A wide array of model-based evaluation techniques is now available, ranging from combinatorial methods, which are useful fo ... Full text Cite

Software rejuvenation - modeling and analysis

Conference IFIP Advances in Information and Communication Technology · January 1, 2004 Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the welldeveloped techniques and analysis for hardware reliability, this trend is not likely to change in the ... Full text Cite

A BDD-Based Algorithm for Analysis of Multistate Systems with Multistate Components

Journal Article IEEE Transactions on Computers · December 1, 2003 In this paper, a new algorithm based on Binary Decision Diagram (BDD) for the analysis of a system with multistate components is proposed. Each state of a multistate component is represented by a Boolean variable, and a multistate system is represented by ... Full text Cite

Adaptive Software Rejuvenation: Degradation Model and Rejuvenation Scheme

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · December 1, 2003 We present a framework of adaptive estimation and rejuvenation of software system performance in the presence of aging sources. The framework specifies that a degradation model not only describe an aging process but also enable the adaptation of model-base ... Cite

Dependability Enhancement for IEEE 802.11 Wireless LAN with Redundancy Techniques

Journal Article Proceedings of the International Conference on Dependable Systems and Networks · December 1, 2003 The presence of physical obstacles and radio interference results in the so called "shadow regions" in wireless networks. When a mobile station roams into a shadow region, it loses its network connectivity. In cellular networks, in order to minimize the co ... Cite

Software Performance Analysis Using a Language Measure

Journal Article Proceedings of the American Control Conference · November 6, 2003 This paper presents software performance analysis using a finite state automaton model. A signed real measure of formal languages has been used for quantitative evaluation of the software. This paper extends an earlier model based on a discrete time Markov ... Cite

Performance Analysis of Reservation Media-Access Protocol with Access and Serving Queues Under Bursty Traffic in GPRS/EGPRS

Journal Article IEEE Transactions on Vehicular Technology · November 1, 2003 Performance modeling of the contention-based reservation protocol in general packet radio service (GPRS)/enhanced general packet radio service (EGPRS) under bursty traffic is practically useful in system design. Instead of using discrete event simulation, ... Full text Cite

Performance modeling of wireless networks with generally distributed handoff interarrival times

Journal Article Computer Communications · September 22, 2003 Handoff is an important issue in cellular mobile telephone systems. Recently, studies that question the validity of the assumption of handoff arrivals being Poissonian have appeared in the literature. The reasoning behind this claim can be summarized as fo ... Full text Cite

Preventive maintenance of multi-state system with phase-type failure time distribution and non-zero inspection time

Journal Article International Journal of Reliability, Quality and Safety Engineering · September 1, 2003 Preventive maintenance is applied to improve the system availability or decrease the operational cost. This paper addresses the optimal preventive maintenance problem for multi-state deteriorating systems, where the system experiences multiple stages of pe ... Full text Cite

Performability modelling of wireless communication systems

Journal Article International Journal of Communication Systems · August 1, 2003 The high expectations of performance and availability for wireless mobile systems has presented great challenges in the modelling and design of fault tolerant wireless systems. The proper modelling methodology to study the degradation of such systems is so ... Full text Cite

Modeling of user perceived webserver availability

Journal Article IEEE International Conference on Communications · July 18, 2003 We propose to use Markov regenerative process (MRGP) models to study the availability of Internet-based services perceived by a Web user, which capture the interactions between the service facility and the user. The necessity of the sophisticated MRGP mode ... Cite

Hierarchical composition and aggregation of state-based availability and performability models

Journal Article IEEE Transactions on Reliability · March 1, 2003 Telecommunication systems are large and complex, consisting of multiple intelligent modules in shelves, multiple shelves in frames, and multiple frames to compose a single network element. In the availability and performability analysis of such a complex s ... Full text Cite

SITAR: A scalable intrusion-tolerant architecture for distributed services

Conference Foundations of Intrusion Tolerant Systems, OASIS 2003 · January 1, 2003 This paper presents a intrusion tolerant architecture for distributed services, especially COTS servers. An intrusion tolerant system assumes that attacks will happen, and some will be successful. However, a wide range of mission critical applications need ... Full text Cite

Recent advances in modeling response-time distributions in real-time systems

Journal Article Proceedings of the IEEE · January 1, 2003 Real-time systems are an important class of process control systems that need to respond to events under time constraints, or deadlines. Such systems may also be required to deliver service in spite of hardware or software faults in their components. This ... Full text Cite

Architecture-Based Approaches to Software Reliability Prediction

Journal Article Computers and Mathematics with Applications · January 1, 2003 With growing emphasis on reuse, the software development process moves toward component-based software design. As a result, there is a need for modeling approaches that are capable of considering the architecture of the software made out of components. Thi ... Full text Cite

Security analysis of SITAR intrusion tolerance system

Journal Article Proceedings of the ACM Workshop on Survivable and Self-Regenerative Systems · January 1, 2003 Security is an important QoS attribute for characterizing intrusion tolerant computing systems. Frequently however, the security of computing systems is assessed in a qualitative manner based on the presence and absence of certain functional characteristic ... Full text Cite

Importance analysis with Markov chains

Journal Article Proceedings of the Annual Reliability and Maintainability Symposium · January 1, 2003 An overview is given of novel techniques for computing importance measures in state space dependability models. Specifically, reward functions in a Markov reward model (MRM) are utilized for this purpose, in contrast to the common method of computing impor ... Cite

Maximizing interval reliability in operational software system with rejuvenation

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2003 Software aging often affects the performance of a software system and eventually causes it to fail. A novel approach to handle transient software failures is called software rejuvenation which can be regarded as a preventive and proactive solution that is ... Full text Cite

Specification-level integration of simulation and dependability analysis

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003 Software architectural choices have a profound influence on the quality attributes supported by a system. Architecture analysis can be used to evaluate the influence of design decisions on important quality attributes such as maintainability, performance a ... Full text Cite

Analysis of inspection-based preventive maintenance in operational software systems

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · December 17, 2002 Recently, the phenomenon of "software aging", one in which the state of a software system gradually degrades with time and eventually leads to performance degradation or crash/hang failure, has been reported. Preventive maintenance of operational software ... Full text Cite

Analytic modeling of handoffs in wireless cellular networks

Journal Article Information Sciences · December 1, 2002 In this paper, we report our recent work on closed form solutions to the blocking and dropping probability in wireless cellular networks with handoff. First, we develop a performance model of a cell in a wireless network where the effect of handoff arrival ... Full text Cite

Analytic Modeling of Handoffs in Wireless Cellular Networks

Journal Article Proceedings of the Joint Conference on Information Sciences · December 1, 2002 In this paper, we report our recent work on closed form solutions to the blocking and dropping probability in wireless cellular networks with handoff. First, we develop a performance model of a cell in a wireless network where the effect of handoff arrival ... Cite

Failure mitigation for quality of service of wireless networks

Journal Article Proceedings of the IEEE Conference on Decision and Control · December 1, 2002 This paper addresses the outage problem and its mitigation for quality of service (QoS) of wireless networks with fading channels. At first, we set up a continuous time Markov chain (CTMC) model that includes various source states and channel outage states ... Cite

Network survivability performance evaluation: A quantitative approach with applications in wireless Ad-hoc Networks

Journal Article Proceedings of the International Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems · December 1, 2002 Network survivability reflects the ability of a network to continue to function during and after failures. Our purpose in this paper is to propose a quantitative approach to evaluate network survivability. We perceive the network survivability as a composi ... Cite

SREPT: A tool for software reliability estimation and prediction

Journal Article Proceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002 A tool called Software Reliability Estimation and Prediction Tool (SREPT) that seeks to address the limitation given in other tools is presented. Unlike most models that assume instantaneous and perfect debugging, SREPT allows the users to analyze the effe ... Full text Cite

Reliability and availability analysis for the JPL remote exploration and experimentation system

Journal Article Proceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002 The NASA Remote Exploration and Experimentation (REE) Project, managed by the Jet Propulsion Laboratory, has the vision of bringing commercial supercomputing technology into space, in a form which meets the demanding environmental requirements, to enable a ... Cite

Modeling and quantification of security attributes of software systems

Journal Article Proceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002 Quite often failures in network based services and server systems may not be accidental, but rather caused by deliberate security intrusions. We would like such systems to either completely preclude the possibility of a security intrusion or design them to ... Full text Cite

A simple characterization of provably efficient prefetching algorithms

Journal Article Proceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002 In this paper, we characterize a broad class C of prefetching algorithms and prove that, for any prefetching algorithm in this class, its total elapsed time is no more than twice the smallest possible total elapsed time. This result provides a performance ... Cite

SHARPE 2002: Symbolic hierarchical automated reliability and performance evaluator

Journal Article Proceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002 SHARPE is a well known package in the field of reliability and performability, used in universities as well as in companies. It is believed that SHARPE is a useful modeler's "toolchest" because it contains support for multiple model types and provides flex ... Full text Cite

System availability with non-exponentially distributed outages

Journal Article IEEE Transactions on Reliability · June 1, 2002 This paper studies the steady-state availability of systems with times to outages and recoveries that are generally distributed. Availability bounds are derived for systems with limited information about the distributions. Also investigated are the applica ... Full text Cite

Call admission control for reducing dropped calls in CDMA cellular systems

Journal Article Computer Communications · May 1, 2002 Call admission control (CAC) algorithms that reduce dropped calls in code-division multiple access (CDMA) cellular systems are discussed in this paper. The capacity of a CDMA system is confined by the interference of users from both inside and outside of t ... Full text Cite

OPTIMAL WEBSERVER SESSION TIMEOUT SETTINGS FOR WEB USERS

Conference 28th International Computer Measurement Group Conference, CMG 2002 · January 1, 2002 From an end user’s point of view, too short a Webserver timeout implies too many forced logouts, and too long a timeout duration poses a higher security risk to users’ sensitive data. We propose cost functions to select the timeout value, which are based o ... Cite

Second-order stochastic fluid models with fluid-dependent flow rates

Journal Article Performance Evaluation · 2002 In this paper, the analysis of second-order stochastic fluid models, where the fluid rate is dependent on the fluid level, is addressed. The boundary conditions are presented for the fluid models under consideration, which have extended previous work with ... Full text Link to item Cite

A methodology towards automatic implementation of N-body algorithms

Journal Article Applied Numerical Mathematics · January 1, 2002 We propose a methodology aimed at automating the software development of fast discrete transforms for N-body problems. The methodology starts with a representation of the transform matrix in compact form. Then, two translation phases are applied. One trans ... Full text Cite

Closed-form analytical results for condition-based maintenance

Journal Article Reliab. Eng. Syst. Saf. (UK) · 2002 Preventive maintenance is applied to improve the device availability or decrease the repair costs when the device failures are in deterioration (or aging) phase. Preventive maintenance can be made more efficient by periodic monitoring wherein the state of ... Cite

Availability models with age-dependent checkpointing

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2002 In this paper, we consider a new stochastic model for a file recovery action with checkpointing when the system failure occurs according to a homogeneous Poisson process. The present checkpoint model strongly depends on the system age and is quite differen ... Cite

Optimal estimation of training interval for channel equalizations

Journal Article IEEE International Conference on Communications · January 1, 2002 In this paper, an optimal training equalization for wireless communication is proposed and analyzed. By our scheme, the training of the equalizer is carried out periodically, with the training interval optimized for a maximal channel utilization. A closed- ... Cite

Application of semi-Markov process and CTMC to evaluation of UPS system availability

Journal Article Proceedings of the Annual Reliability and Maintainability Symposium · January 1, 2002 In this paper we develop analytical models for the study of the dependability characteristics of systems with uninterruptible power supply (UPS) units. Dependability of systems with UPS cannot be modeled exactly using the prevalent Markov modeling approach ... Cite

Reliability prediction and sensitivity analysis based on software architecture

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2002 Prevalent approaches to characterize the behavior of monolithic applications are inappropriate to model modern software systems which are heterogeneous, and are built using a combination of components picked off the shelf, those developed in-house and thos ... Full text Cite

Software reliability and rejuvenation: Modeling and analysis

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2002 Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the well-developed techniques and analysis for hardware reliability, this trend is not likely to change in the ... Full text Cite

A framework for performability modeling of messaging services in distributed systems

Conference Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS · January 1, 2002 Messaging services are a useful component in distributed systems that require scalable dissemination of messages (events) from suppliers to consumers. These services decouple suppliers and consumers, and take care of client registration and message propaga ... Full text Cite

An approach for estimation of software aging in a Web server

Conference ISESE 2002 - Proceedings, 2002 International Symposium on Empirical Software Engineering · January 1, 2002 A number of recent studies have reported the phenomenon of "software aging", characterized by progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of ... Full text Cite

Modeling and analysis of software rejuvenation in cable modem termination systems

Conference Proceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2002 In order to reduce system outages and the associated downtime cost caused by the "software aging" phenomenon, we propose to use software rejuvenation as a proactive system maintenance technique deployed in a CMTS (Cable Modem Termination System) cluster sy ... Full text Cite

All-terminal reliability analysis of the SRP-ring: The effect of enhanced intelligent protection switching

Conference Proceedings - International Conference on Computer Communications and Networks, ICCCN · January 1, 2002 Spatial reuse protocol (SRP) is a media access control (MAC)-layer protocol that operates over a double counter-rotating ring network topology. SRP is designed to enhance the SONET network so that it can handle data traffic more efficiently. We study the a ... Full text Cite

Analysis of hypergeometric distribution software reliability model

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 2001 This article gives the detailed mathematical results on the hypergeometric distribution software reliability model (HGDSRM) proposed by Tohma et al. [IEEE Trans. Software Eng. (1989, 1991)]. In the above papers, Tohma et al. developed the HGDSRM as a discr ... Cite

Estimating software rejuvenation schedules in high-assurance systems

Journal Article Computer Journal · December 1, 2001 Software rejuvenation is a preventive maintenance technique that has been extensively studied in recent literature. In this paper, we extend the classical result by Huang et al. (1995), and in addition propose a modified stochastic model to generate the so ... Full text Cite

RED parameters and performance of TCP connections

Journal Article Electronics Letters · November 22, 2001 The problem of transmission control protocol (TCP) traffic with random early detection (RED) is addressed. With the formulation of stochastic differential equation (SDE), an explicit expression for the relation between RED parameters and network parameters ... Full text Cite

A performance model of partial packet discard and early packet discard schemes in ATM switches

Journal Article Computer Communications · October 1, 2001 In this paper, we develop a concise performance model of partial packet discard (PPD) and early packet discard (EPD) schemes in ATM switches. We study the performance of PPD and EPD with heterogeneous traffic sources. The sources included Poisson, and ON-O ... Full text Cite

Composite performance and availability analysis of wireless communication networks

Journal Article IEEE Transactions on Vehicular Technology · September 1, 2001 With the increasing popularity of wireless communication systems, customers are expecting the same level of service, availability, and performance from the wireless communication networks as the traditional wire-line networks. Traditional pure performance ... Full text Cite

A method for multiple channel recovery in TDMA wireless communications systems

Journal Article Computer Communications · July 15, 2001 A single base repeater failure in time division multiple access (TDMA) wireless systems causes all active calls on this base repeater to be dropped. In order to increase system end-to-end availability, a multiple channel recovery method for TDMA wireless s ... Full text Cite

Architecture-based approach to reliability assessment of software systems

Journal Article Performance Evaluation · July 1, 2001 With the growing emphasis on reuse, software development process moves toward component-based software design. As a result, there is a need for modeling approaches that are capable of considering the architecture of the software and estimating the reliabil ... Full text Cite

Performability modelling techniques and tools

Book · June 18, 2001 Performability modelling and evaluation brings together two disciplines that have long been treated separately in different communities: computer and communication system performance evaluation and system reliability and availability ... ... Cite

Performance of broadcast and unknown server (BUS) in ATM LAN emulation

Journal Article IEEE/ACM Transactions on Networking · June 1, 2001 In this paper, we develop performance models of the Broadcast and Unknown Server (BUS) in the LANE. The traffic on the BUS is divided into two classes: the broadcast and multicast traffic, and the unicast relay flow. The broadcast and multicast traffic is ... Full text Cite

Loss formulas and their application to optimization for cellular networks

Journal Article IEEE Transactions on Vehicular Technology · May 1, 2001 In this paper, we develop a performance model of a cell in a wireless communication network where the effect of handoff arrival and the use of guard channels is included. Fast recursive formulas for the loss probabilities of new calls and handoff calls are ... Full text Cite

A new handoff scheme for decreasing both dropped calls and blocked calls in CDMA system

Conference EUROCON 2001 - International Conference on Trends in Communications, Proceedings · January 1, 2001 Soft handoff in the CDMA cellular system is analyzed. To improve performance degradation due to channel resource shortage during soft handoff, we propose a new scheme which converts channels occupied by some pseudo-handoff calls to new handoff calls. Stoch ... Full text Cite

Characterizing intrusion tolerant systems using a state transition model

Conference Proceedings - DARPA Information Survivability Conference and Exposition II, DISCEX 2001 · January 1, 2001 Intrusion detection and response research has so far mostly concentrated on known and well-defined attacks. We believe that this narrow focus of attacks accounts for both the successes and limitation of commercial intrusion detection systems (IDS). Intrusi ... Full text Cite

Analysis and implementation of software rejuvenation in cluster systems

Journal Article Performance Evaluation Review · January 1, 2001 Several recent studies have reported the phenomenon of "software aging", one in which the state of a software system degrades with time. This may eventually lead to performance degradation of the software or crash/hang failure or both. "Software rejuvenati ... Full text Cite

Comparison of Hybrid Systems and Fluid Stochastic Petri Nets

Journal Article Discrete Event Dynamic Systems: Theory and Applications · 2001 Hybrid Systems are models of interacting digital and continuous devices with applications in the control of aircraft, computers, or modern cars for instance. Concurrently, Fluid Stochastic Petri Nets (FSPNs) have been introduced as an extension of stochast ... Full text Link to item Cite

Proactive management of software aging

Journal Article IBM Journal of Research and Development · January 1, 2001 In response to the strong desire of customers to be provided with advance notice of unplanned outages, techniques were developed that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reach ... Full text Cite

Comparison of architecture-based software reliability models

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · January 1, 2001 Many architecture-based software reliability models have been proposed in the past without any attempt to establish a relationship among them. The aim of this paper is to fill this gap. First, the unifying structural properties of the models are exhibited ... Full text Cite

Performance analysis of the corba notification service

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2001 As CORBA (Common Object Request Broker Architecture) gains popularity as a standard for portable, distributed, object-oriented computing, the need for a CORBA messaging solution is being increasingly felt. This led the Object Management Group (OMG) to spec ... Full text Cite

Uncertainty analysis in reliability modeling

Journal Article Proceedings of the Annual Reliability and Maintainability Symposium · January 1, 2001 In reliability analysis of computer systems, models such as fault trees, Markov chains, and stochastic Petri nets(SPN) are built to evaluate or predict the reliability of the system. In general, the parameters in these models are usually obtained from fiel ... Cite

Analysis of periodic preventive maintenance with general system failure distribution

Conference Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2001 Preventive maintenance is applied to improve the system availability or decrease the operational cost. In this paper the preventive maintenance with generally distributed parameters are discussed, and the steady-state solution is obtained by solving the un ... Full text Cite

Reliable messaging using the CORBA Notification Service

Conference Proceedings - 3rd International Symposium on Distributed Objects and Applications, DOA 2001 · January 1, 2001 With the growing popularity of the CORBA architecture as a distributed computing infrastructure standard, the need for a reliable CORBA messaging solution is being increasingly felt. The Event Service, which is the first such solution, provides inadequate ... Full text Cite

Stochastic Petri nets and their applications

Conference PERFORMANCE AND QOS OF NEXT GENERATION NETWORKING · January 1, 2001 Link to item Cite

Performability analysis of TDMA cellular systems based on composite and hierarchical Markov chain models

Conference PERFORMANCE AND QOS OF NEXT GENERATION NETWORKING · January 1, 2001 Link to item Cite

7 Failure correlation in software reliability models

Journal Article IEEE Transactions on Reliability · December 1, 2000 Perhaps the most stringent restriction in most software reliability models is the assumption of statistical independence among successive software failures. Our research was motivated by the fact that although there are practical situations in which this a ... Full text Cite

Heuristic self-organization algorithms for software reliability assessment and their application

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 2000 The GMDH (group method of data handling) network is an adaptive learning machine based on the principle of heuristic self-organization. In this paper, we apply the GMDH networks to predict software reliability in testing phase. Three kinds of networks: the ... Cite

Composite performance and availability analysis of communications networks: A comparison of exact and approximate approaches

Journal Article Conference Record / IEEE Global Telecommunications Conference · December 1, 2000 Traditional pure performance model that ignores failure and recovery but considers resource contention generally overestimates the system's ability to perform a certain job. On the other hand, pure availability analysis tends to be too conservative since p ... Cite

SREPT: Software Reliability Estimation and Prediction Tool

Journal Article Performance Evaluation · January 1, 2000 Several tools have been developed for the estimation of software reliability. However, they are highly specialized in the approaches they implement and the particular phase of the software life-cycle in which they are applicable. There is an increasing nee ... Full text Cite

Channel allocation with recovery strategy in wireless networks

Journal Article European Transactions on Telecommunications · January 1, 2000 With the increasing penetration of wireless communications systems, customers are expecting the same level of service, reliability and performance from the wireless communication systems as the traditional wire-line networks. Due to the dynamic environment ... Full text Cite

Performance analysis of the CORBA event service using stochastic reward nets

Journal Article Proceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2000 The Event service is the earliest CORBA solution to the message queue model of communication in distributed systems. Typical implementations however suffer from the lack of event delivery guarantees. The loss of messages is aggravated in the presence of bu ... Cite

Modeling and analysis of software aging and rejuvenation

Journal Article Proceedings of the IEEE Annual Simulation Symposium · January 1, 2000 Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of 'software aging', one in which the state of the software system degrades with time, has been reported. To counteract this phenomenon, a proactive approac ... Cite

Call admission control for reducing dropped calls in code division multiple access (CDMA) cellular systems

Journal Article Proceedings - IEEE INFOCOM · January 1, 2000 Call admission control algorithms that reduce dropped calls in CDMA cellular systems are discussed in this paper. The capacity of a CDMA system is confined by interference of users from both inside and outside of the target cell. Earlier algorithms for cal ... Cite

Effects of failure correlation on software in operation

Conference Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2000 Since the early 1970's a number of models have been proposed for estimating software reliability. However, the realism of many of the underlying assumptions and the applicability of these models continue to be questioned. Our research work was motivated by ... Full text Cite

Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule

Conference Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2000 In this paper, we extend the classical result by Huang, Kintala, Kolettis and Fulton (1995), and in addition propose a modified stochastic model to determine the software rejuvenation schedule. More precisely, the software rejuvenation models are formulate ... Full text Cite

Building a reliable message delivery system using the CORBA Event Service

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 In this paper we study the suitability of the CORBA Event Service as a reliable message delivery mechanism. We first show that products built to the CORBA Event Service specification will not guarantee against loss of messages or guarantee order. This is n ... Full text Cite

SREPT: Software reliability estimation and prediction tool

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 Although several tools have been developed for the estima-tion of software reliability, they are highly specialized in the approaches they implement and the particular phase of the software lifecycle in which they are applicable. Also the conventional tech ... Full text Cite

Reliability and performability modeling using SHARPE 2000

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 The SHARPE package, Symbolic Hierarchical Automated Reliability and Performance Evaluator, is now 13 years old. A well known package in the field of reliability and performability, SHARPE is used in universities as well as in companies. Many important chan ... Full text Cite

SPNP: Stochastic petri nets. Version 6. 0

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 Full text Cite

Implementation of importance splitting techniques in stochastic petri net package

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 Stochastic Petri Net Package (SPNP) is a software package whose goal is to compute performance, availability or performability measures from Stochastic Petri Nets (SPN) and Fluid Stochastic Petri nets (FSPN). This software can use either analytic numeric m ... Full text Cite

Stochastic modeling formalisms for dependability, performance and performability

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 Full text Cite

The optimal preventive maintenance policy for a software system with multi server station

Conference 6TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS · 2000 Cite

Analysis of software cost models with rejuvenation

Conference Proceedings of IEEE International Symposium on High Assurance Systems Engineering · January 1, 2000 Software rejuvenation is a preventive maintenance technique that has been extensively studied in the recent literature. In this paper we extend the classical result by Huang et al. (1995), and in addition propose a modified stochastic model to generate the ... Full text Cite

A BDD-based algorithm for reliability analysis of phased-mission systems

Journal Article IEEE Transactions on Reliability · December 1, 1999 This paper presents a new algorithm (PMS-BDD) based on the binary decision diagram (BDD) for reliability analysis of phased-mission systems (PMS). PMS-BDD uses phase algebra to deal with the dependence across the phases, and a new BDD operation to incorpor ... Full text Cite

Measurement-based model for estimation of resource exhaustion in operational software systems

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999 Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of `software aging', one in which the state of the software system degrades with time, has been reported. The primary causes of this degradation are the exh ... Cite

A channel recovery method in TDMA wireless systems

Journal Article IEEE Vehicular Technology Conference · December 1, 1999 A single base repeater failure in TDMA wireless systems causes all active calls on this base repeater to be dropped. In order to increase system end-to-end availability, an RF channel recovery method for TDMA wireless systems is proposed in this paper. By ... Full text Cite

Failure correlation in software reliability models

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999 Perhaps the most stringent restriction that is present in most software reliability models is the assumption of independence among successive software failures. Our research was motivated by the fact that although there are practical situations in which th ... Cite

Confidence interval estimation of NHPP-based software reliability models

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999 The software reliability growth models (such as NHPP models) are frequently used in software reliability prediction. Estimation of parameters in these models is often done by point estimation. However, some numerical problems arise and make the actual comp ... Cite

Effect of Web caching on network planning

Journal Article Computer Communications · September 15, 1999 In this paper, the effect of Web caching on network planning, in the sense of bandwidth computation for the access link interconnecting the ISP's subnet with the Internet, is studied by means of simulations. The latency of a browser retrieving files is stu ... Full text Cite

Stochastic reward net model for performance analysis of prioritized DQDB MAN

Journal Article Computer Communications · June 15, 1999 The performance of prioritized distributed queue dual bus (DQDB) metropolitan area network (MAN) under bursty traffic environment is studied in this article. The tagged node model is adopted to simplify the analysis. The processes of the packet arrivals to ... Full text Cite

Advanced Computer System Design

Book · January 18, 1999 This text focuses on the major issues involved in computer design and architectures. ... Cite

A channel recovery method for RF channel failure in wireless communications systems

Journal Article IEEE Wireless Communications and Networking Conference, WCNC · January 1, 1999 An RF channel at a cell is assigned to a call during the call set-up process. The channel is dedicated to the subscriber until the call is terminated (normal termination) or the subscriber leaves the cell (handoff). However, the RF channel may fail due to ... Full text Cite

Dependability modeling and evaluation of phased mission systems: A DSPN approach

Conference Dependable Computing for Critical Applications 7 · January 1, 1999 We focus on analytical modeling for the dependability evaluation of phased-mission systems. Because of their dynamic behavior, systems showing a phased behavior offer challenges in modeling. We propose the modeling and evaluation of phased-mission system d ... Full text Cite

A reliable CORBA-based network management system

Conference IEEE International Conference on Communications · January 1, 1999 Network management provides the central nervous system for the networks of telecommunications providers. A telco's network management system (NMS) needs to support uninterrupted management functionality of complex networks. The reliability of such systems ... Full text Cite

Algorithm for reliability analysis of phased-mission systems

Journal Article Reliability Engineering and System Safety · 1999 The purpose of this paper is to describe an efficient Boolean algebraic algorithm that provides exact solution to the unreliability of a multi-phase mission system where the configurations are described through fault trees. The algorithm extends and improv ... Full text Link to item Cite

Performance analysis of distributed real-time databases

Journal Article Performance Evaluation · January 1, 1999 In a distributed process control system, information about the behavior of physical processes is usually collected and stored in a real-time database which can be remotely accessed by human operators. In this paper we propose an analytic approach to comput ... Full text Cite

Integrated reliability modeling environment

Journal Article Reliability Engineering and System Safety · January 1, 1999 In this paper, we propose an integrated reliability/availability modeling and analysis environment suitable for heterogeneous hierarchical system analysis. A key component of this environment is a high level system specification and input language which ac ... Full text Cite

A time/structure based software reliability model

Journal Article Annals of Software Engineering · January 1, 1999 The past 20 years have seen the formulation of numerous analytical software reliability models for estimating the reliability growth of a software product. The predictions obtained by applying these models tend to be optimistic due to the inaccuracies in t ... Full text Cite

Discrete-event simulation of fluid stochastic Petri nets

Journal Article IEEE Transactions on Software Engineering · January 1, 1999 The purpose of this paper is to describe a method for the simulation of the recently introduced fluid stochastic Petri nets. Since such nets result in rather complex system of partial differential equations, numerical solution becomes a formidable task. Be ... Full text Cite

Transient analysis of minimum duration outage for RF channel in cellular systems

Journal Article IEEE VTS 50th Vehicular Technology Conference, VTC 1999-Fall · January 1, 1999 Following Mandayam et al., we define outage events as the channel being attenuated for at least a deterministic period of time, τm. Compared with continuous time Markov chain or discrete time Markov chain, a semi-Markov process (SMP) is general enough that ... Cite

Performance and reliability evaluation of passive replication schemes in application level fault tolerance

Journal Article Proceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1999 Process replication is provided as the central mechanism for application level software fault tolerance in SwiFT and DOORS. These technologies, implemented as reusable software modules, support cold and warm schemes of passive replication. The choice of a ... Cite

Dependability analysis of distributed computer systems with imperfect coverage

Journal Article Proceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1999 In this paper, a new algorithm based on Binary Decision Diagrams (BDD) for dependability analysis of distributed computer systems (DCS) with imperfect coverage is proposed. Minimum file spanning trees (MFST) are generated and stored via BDD manipulation. B ... Cite

Availability and performance evaluation for automatic protection switching in TDMA wireless system

Conference Proceedings - 1999 Pacific Rim International Symposium on Dependable Computing, PRDC 1999 · January 1, 1999 In this paper, we compare the availability and performance of a wireless TDMA system with and without automatic protection switching. Stochastic reward net models are constructed and solved by SPNP (Stochastic Petri Net Package). Hierarchical decomposition ... Full text Cite

Dependability modelling and sensitivity analysis of scheduled maintenance systems

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1999 In this paper we present a new modelling approach for dependability evaluation and sensitivity analysis of Scheduled Maintenance Systems, based on a Deterministic and Stochastic Petri Net approach. The DSPN approach offers significant advantages in terms o ... Full text Cite

Performability analysis of fault tolerant RF link design in wireless communications networks

Conference ESM'99 - MODELLING AND SIMULATION: A TOOL FOR THE NEXT MILLENNIUM, VOL 1 · January 1, 1999 Link to item Cite

Locating program features using execution slices

Conference Proceedings - 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology, ASSET 1999 · January 1, 1999 An important step towards effective software maintenance is to locate the code relevant to a particular feature. We report a study applying an execution slice-based technique to a reliability and performance evaluator to identify the code which is unique t ... Full text Cite

Analysis of preventive maintenance in transactions based software systems

Journal Article IEEE Transactions on Computers · December 1, 1998 Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software "aging." However, it incurs some overhead. The necessity to do preventive maintenance, not ... Full text Cite

Methodology for detection and estimation of software aging

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998 The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it's crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management ... Cite

Software reliability analysis incorporating fault detection and debugging activities

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998 Software reliability measurement problem can be approached by obtaining the estimates of the residual number of faults in the software. Traditional black-box based approaches to software reliability modeling assume that the debugging process is instantaneo ... Cite

Reliability simulation of component-based software systems

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998 Two case studies, one of a terminating application, and the other of a real-time application with feedback control, are presented to illustrate the flexibility offered by discrete-event simulation to analyze complex systems. Data from these studies confirm ... Cite

An improved algorithm for coherent-system reliability

Journal Article IEEE Transactions on Reliability · December 1, 1998 This paper presents a simpler and more efficient algorithm (I_VT), based on the one proposed by Veeraraghavan & Trivedi (VT), to calculate system reliability using 'sum of disjoint products' and 'multiple variable inversion' (MVI) techniques. A proposition ... Full text Cite

Increasing application accessibility through Java

Journal Article IEEE Internet Computing · July 1, 1998 Java can be used to create a network computing platform that lets users share applications not specifically devised for the Web. The authors used one such platform to port an existing tool and develop a new application. ... Full text Cite

Fluid stochastic petri nets: Theory, applications, and solution techniques

Journal Article European Journal of Operational Research · February 16, 1998 In this paper we introduce a new class of stochastic Petri nets in which one or more places can hold fluid rather than discrete tokens. We define a class of fluid stochastic Petri nets in such a way that the discrete and continuous portions may affect each ... Full text Cite

Applications of non-Markovian stochastic Petri nets

Journal Article Performance Evaluation Review · January 1, 1998 Petri nets represent a powerful paradigm for modeling parallel and distributed systems. Parallelism and resource contention can easily be captured and time can be included for the analysis of system dynamic behavior. Most popular stochastic Petri nets assu ... Full text Cite

Petri nets with k simultaneously enabled generally distributed timed transitions

Journal Article Performance Evaluation · January 1, 1998 Stochastic Petri nets have been used to analyze the performance and reliability of complex systems comprising concurrency and synchronization. Various extensions have been proposed in literature in order to broaden their field of application to an increasi ... Full text Cite

Recent developments in non-Markovian stochastic Petri nets

Journal Article Journal of Circuits, Systems and Computers · January 1, 1998 Analytical modeling plays a crucial role in the analysis and design of computer systems. Stochastic Petri Nets represent a powerful paradigm, widely used for such modeling in the context of dependability, performance and performability. Many structural and ... Full text Cite

Availability modeling of energy management systems

Journal Article Microelectronics Reliability · January 1, 1998 Energy management system (EMS) computer architectures have changed significantly over the recent past increasing the difficulty and the need for a priori assessment of system performance and dependability. The old practice based on measurements is no longe ... Full text Cite

Analysis of conditional MTTF of fault-tolerant systems

Journal Article Microelectronics Reliability · January 1, 1998 Mean time to failure (MTTF) is one of the most frequently used dependability measures in practice. By convention, MTTF is the expected time for a system to reach any one of the failure states. For some systems, however, the mean time to absorb to a subset ... Full text Cite

Log-logistic software reliability growth model

Conference Proceedings - 3rd IEEE International High-Assurance Systems Engineering Symposium, HASE 1998 · January 1, 1998 The finite-failure non-homogeneous Poisson process (NHPP) models proposed in the literature exhibit either constant, monotonic increasing or monotonic decreasing failure occurrence rates per fault, and are inadequate to describe the failure processes under ... Full text Cite

Srept: Software reliability estimation and prediction tool

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1998 Several tools have been developed for the estimation of soft- ware reliability. However, they are highly specialized in the approaches they implement and the particular phase of the software life-cycle in which they are applicable. There is an increasing n ... Full text Cite

An improved multiple variable inversion algorithm for reliability calculation

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1998 An improved algorithm based on the one proposed by Veeraraghavan and Trivedi(VT) to calculate system reliability using sum of disjoint products (SDP) and multiple variable inversion (MVI) techniques is presented. We compare the improved algorithm with seve ... Full text Cite

Performability analysis of channel allocation with channel recovery strategy in cellular networks

Conference ICUPC 1998 - IEEE 1998 International Conference on Universal Personal Communications, Conference Proceedings · January 1, 1998 We propose and compare three channel recovery schemes for fixed channel assignment. In Scheme I, a failed channel is switched by an idle channel whenever it is available. In Scheme II, the switching strategy is employed only after an attempt to restore the ... Full text Cite

Model validation using simulated data

Conference Proceedings - 1998 IEEE Workshop on Application-Specific Software Engineering and Technology, ASSET 1998 · January 1, 1998 Effective and accurate reliability modeling requires the collection of comprehensive, homogeneous, and consistent data sets. Failure data required for software reliability modeling is difficult to collect, and even the available data tends to be noisy, dis ... Full text Cite

Reliability simulation of fault-tolerant software and systems

Journal Article Proceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS · December 1, 1997 Fault tolerance is a survival attribute of complex computer systems and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability and performance evaluation of hardware ... Cite

Cache error propagation model

Journal Article Proceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS · December 1, 1997 Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as processor allows a short duration f ... Cite

Performability analysis of handoff calls in personal communication networks

Journal Article Proceedings of the International Conference on Computer Communications and Networks, ICCCN · December 1, 1997 A combined performance and dependability (called performability) model for dealing with handoff calls is introduced. Stochastic reward nets (SRNs) are used for this purpose. An SRN model of channel assignment is developed and analyzed. The method of phase ... Cite

The Effect of Detection and Restoration Times for Error Recovery in Communication Networks

Journal Article Journal of Network and Systems Management · January 1, 1997 Detection and restoration times are often ignored when modeling network reliability. In this paper, we develop Markov Regenerative Reward Models (MRRM) to capture the effects of detection and restoration phases of network recovery. States of the MRRM repre ... Full text Cite

Combined performance and availability analysis of a switched network application

Journal Article IEEE International Conference on Communications · January 1, 1997 As switched networks providing services to end users become more commonplace, integral components of these networks must have an increasing level of dependability. One area of interest is determining the optimal number of network servers required for a swi ... Cite

Buffer losses vs. deadline violations for ABR traffic in an ATM switch: A computational approach

Journal Article Telecommunication Systems · January 1, 1997 The B-ISDN will carry a variety of traffic types: the Variable Bit Rate traffic (VBR), of which compressed video is an example, Continuous Bit Rate traffic (CBR), of which telemetry is an example, Data traffic, and Available Bit Rate traffic (ABR) that rep ... Full text Cite

Effect of repair policies on software reliability

Journal Article COMPASS - Proceedings of the Annual Conference on Computer Assurance · January 1, 1997 Software reliability is an important metric that quantifies the quality of the software product and is inversely related to the number of unrepaired faults in the system. Fault removal is a critical process in achieving desired level of quality before soft ... Cite

On the development of dependability-evaluation workbench for high-assurance system designers

Journal Article Proceedings of the High-Assurance Systems Engineering Workshop · January 1, 1997 High-assurance system engineering requires efficient computer-aided dependability evaluation. Although various dependability evaluation techniques and tools have been developed and studied in the last two decades, no adequate attention has been paid to all ... Cite

Discrete-event simulation of fluid stochastic Petri nets

Journal Article International Workshop on Petri Nets and Performance Models · January 1, 1997 The purpose of this paper is to describe a method for simulation of recently introduced fluid stochastic Petri nets. Since such nets result in rather complex set of partial differential equations, numerical solution becomes a formidable task. Because of a ... Cite

On the analysis of software rejuvenation policies

Journal Article COMPASS - Proceedings of the Annual Conference on Computer Assurance · January 1, 1997 Software rejuvenation is a technique for software fault tolerance which involves occasionally stopping the executing software, `cleaning' the `internal state' and restarting. This cleanup is done at desirable times during execution on a preventive basis so ... Cite

Toward accessibility enhancement of dependability modeling techniques and tools

Conference Digest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 · January 1, 1997 Although various dependability evaluation techniques and tools have been developed in the last two decades, no adequate attention has been paid to allow system designers not well versed in analytic modeling to easily employ these techniques and tools. In t ... Full text Cite

IDEA: Integrated design environment for assessment of ATM networks

Journal Article Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS · December 1, 1996 With the increased attention ATM is receiving to meet the needs of a wide variety of applications, tools are needed to help a network designer focus on the design at hand, rather than to spend time exhaustively learning the tools themselves. This is the co ... Cite

Unification of finite failure non-homogeneous Poisson process models through test coverage

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1996 A number of analytical software reliability models have been proposed for estimating the reliability growth of a software product. In this paper we present an Enhanced non-homogeneous Poisson process (ENHPP) model and show that previously reported Non-Homo ... Cite

Comment/correction: dependability modeling using petri nets

Journal Article IEEE Transactions on Reliability · December 1, 1996 Two arcs are missing in a figure of Malhotra & Trivedi (1995); these arcs are necessary for the proper functioning of the GSPN. Also, priorities of immediate transitions in that figure must be clearer. This note presents a correctly drawn GSPN and describe ... Full text Cite

Sufficient conditions for existence of a fixed point in stochastic reward net-based iterative models

Journal Article IEEE Transactions on Software Engineering · December 1, 1996 Stochastic Pétri net models of large systems that are solved by generating the underlying Markov chain pose the problem of largeness of the state-space of the Markov chain. Hierarchical and iterative models of systems have been used extensively to solve th ... Full text Cite

Accelerating mean time to failure computations

Journal Article Performance Evaluation · October 1, 1996 In this paper we consider the problem of numerical computation of the mean time to failure (MTTF) in Markovian dependability and/or performance models. The problem can be cast as a system of linear equations which is solved using an iterative method preser ... Full text Cite

Minimizing completion time of a program by checkpointing and rejuvenation

Conference SIGMETRICS 1996 - Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · May 15, 1996 Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce un ... Full text Cite

Accelerating mean time to failure computations

Journal Article Performance Evaluation · January 1, 1996 In this paper we consider the problem of numerical computation of the mean time to failure (MTTF) in Markovian dependability and/or performance models. The problem can be cast as a system of linear equations which is solved using an iterative method preser ... Full text Cite

Stochastic Petri nets for the reliability analysis of communication network applications with alternate-routing

Journal Article Reliability Engineering and System Safety · January 1, 1996 In this paper, we present a comparative reliability analysis of an application on a corporate B-ISDN network under various alternate-routing protocols. For simple cases, the reliability problem can be cast into fault-tree models and solved rapidly by means ... Full text Cite

Minimizing completion time of a program by checkpointing and rejuvenation

Journal Article Performance Evaluation Review · January 1, 1996 Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce un ... Full text Cite

Optimal software rejuvenation for tolerating soft failures

Journal Article Performance Evaluation · January 1, 1996 In recent studies, the phenomenon of software "aging" has come to light which causes performance of a software to degrade with time. Software rejuvenation is a fault tolerance technique which counteracts aging. In this paper, we address the problem of dete ... Full text Cite

A Comparison of Approximate Interval Estimators for the Bernoulli Parameter

Journal Article American Statistician · January 1, 1996 We compare the accuracy of two approximate confidence interval estimators for the Bernoulli parameter p. The approximate confidence intervals are based on the normal and Poisson approximations to the binomial distribution. Charts are given to indicate whic ... Full text Cite

Transient behavior of ATM networks under overloads

Journal Article Proceedings - IEEE INFOCOM · January 1, 1996 In this paper we characterize the time-dependent behavior of typical queueing systems that arise in ATM networks under the presense of overloads. Transient queue length distribution and transient cell loss probability are obtained numerically and transient ... Cite

SHARPE: a modeler's toolkit

Journal Article Proceedings -IEEE International Computer Performance and Dependability Symposium, IPDS · January 1, 1996 SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) is a program that supports the specification and automated solution of reliability and performance models [1]. It contains support for fault trees, reliability block diagrams, r ... Cite

User-friendly dependability evaluation tool

Journal Article IEEE Proceedings of the National Aerospace and Electronics Conference · January 1, 1996 In order to permit users with little analytic background to evaluate dependability, modeling tools require a user-friendly front end. With this motivation, we have developed a software tool referred to as SDDS for 'Software Dependability for Distributed Sy ... Cite

Important milestones in software reliability modeling

Conference SEKE '96: THE 8TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, PROCEEDINGS · January 1, 1996 Link to item Cite

Fixed point iteration using stochastic reward nets

Journal Article International Workshop on Petri Nets and Performance Models · December 1, 1995 Stochastic Petri Net models of large systems that are solved by generating the underlying Markov chain pose the problem of largeness of the state-space. Hierarchical and iterative models of systems have been used extensively to solve this problem. A proble ... Cite

Performance evaluation of dynamic priority operating systems

Journal Article International Workshop on Petri Nets and Performance Models · December 1, 1995 Operating systems which implement a dynamic priority mechanism are very common. Nevertheless, it is very difficult to develop an accurate analytical model to evaluate their performance, mainly due to the different forms of dependency between the various co ... Cite

Transient analysis of Markov regenerative stochastic petri nets: a comparison of approaches

Journal Article International Workshop on Petri Nets and Performance Models · December 1, 1995 In this paper we present and compare two different approaches for the transient solution of Markov regenerative stochastic Petri Nets: the method based on Markov regenerative theory and the method of supplementary variables. In both cases the equations tha ... Cite

Analysis of software rejuvenation using Markov regenerative stochastic petri net

Journal Article Proceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1995 In a client-server type system, the server software is required to run continuously for very long periods. Due to repeated and potentially faulty usage by many clients, such software 'ges' with time and eventually fails. Huang et. al. proposed a technique ... Cite

Preemptive repeat identical transitions in Markov regenerative stochastic Petri Nets

Journal Article International Workshop on Petri Nets and Performance Models · December 1, 1995 The recent literature on Markov Regenerative Stochastic Petri Nets (MRSPN) assumes that the random firing time associated to each transition is resampled each time the transition fires or is disabled by the firing of a competitive transition. This modeling ... Cite

Effect of detection and restoration times for error recovery in communication networks

Journal Article Proceedings - IEEE Military Communications Conference MILCOM · December 1, 1995 Detection and restoration times are often ignored when modeling network reliability. In this paper, we develop Markov Regenerative Reward Models (MRRM) to capture the effects of detection and restoration phases of network recovery. States of the MRRM repre ... Cite

Non-Markovian Petri Nets

Conference Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1995/PERFORMANCE 1995 · May 1, 1995 Non-Markovian models allow us to capture a very wide range of circumstances in which it is necessary to model phenomena whose times to occurrence is not exponentially distributed. Events such as timeouts in a protocol, service times at a machine performing ... Full text Cite

Time-dependent behavior of redundant systems with deterministic repair

Conference COMPUTATIONS WITH MARKOV CHAINS · January 1, 1995 Link to item Cite

Dependability modeling of real-time systems using stochastic reward nets

Journal Article Microelectronics Reliability · January 1, 1995 Dependability modeling plays a major role in the design, validation and maintenance of real-time computing systems. Typical models provide measures such as mean time to failure, reliability and safety as functions of the component failure rates and fault/e ... Full text Cite

Data integrity analysis of disk array systems with analytic modeling of coverage

Journal Article Performance Evaluation · January 1, 1995 Detailed dependability models of various disk array organizations are developed taking into account both the hard disk failures and transient errors. Various error and failure modes of individual disks and the disk array are identified. A small proportion ... Full text Cite

Numerical Methods for Reliability Evaluation of Markov Closed Fault-Tolerant Systems

Journal Article IEEE Transactions on Reliability · January 1, 1995 This paper compares three numerical methods for reliability calculation of Markov, closed, fault-tolerant systems which give rise to continuous-time, time-homogeneous, finite-state, acyclic Markov chains. We consider a modified version of Jensen's method ( ... Full text Cite

Dependability Modeling Using Petri-Nets

Journal Article IEEE Transactions on Reliability · January 1, 1995 This paper describes a methodology to construct dependability models using generalized stochastic Petri nets (GSPN) and stochastic reward nets (SRN). Algorithms are provided to convert a fault tree (a commonly used combinatorial model type) model into equi ... Full text Cite

Semi-numerical transient analysis of Markov models

Journal Article Proceedings of the Annual Southeast Conference · January 1, 1995 We present a new O(n3) algorithm for seminumerical transient analysis of continuous time Markov chains with n states. The algorithm is based on spectral decomposition of the transition rate matrix in combination with partial fraction expansion based on Lap ... Full text Cite

Buffer sizing for ABR traffic in an ATM switch

Journal Article IEEE International Conference on Communications · January 1, 1995 The B-ISDN will carry a variety of traffic types: the Variable Bit Rate traffic (VBR), of which compressed video is an example, Continuous Bit Rate traffic (CBR), of which telemetry is an example, Data traffic, and Available Bit Rate traffic (ABR) that rep ... Cite

Markov regenerative models

Journal Article Proceedings - International Computer Performance and Dependability Symposium · January 1, 1995 The Markov Regenerative Stochastic Process (MRGP) has been shown to capture the behavior of real systems with both deterministic and exponentially distributed event times. In this paper we survey the MRGP literature and focus on the different solution tech ... Cite

Componentwise decomposition for an efficient reliability computation of systems with repairable components

Journal Article Proceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1995 Fault trees and Markov chains are commonly used for dependability modeling. Markov chains are powerful in that various kinds of dependencies can be easily modeled that fault tree models have difficulty capturing, but the state space grows exponentially in ... Full text Cite

From stochastic Petri nets to Markov regenerative stochastic Petri nets

Conference Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS · January 1, 1995 In this paper we survey the Petri net literature and focus on Petri nets with generally distributed transition firing times. In the framework of Markov regenerative stochastic Petri nets (MRSPN) we develop and solve two examples to illustrate the modeling ... Full text Cite

A survey of efficient reliability computation using disjoint products approach

Journal Article Networks · January 1, 1995 Several algorithms have been developed to solve the reliability problem for nonseries‐parallel networks using the sum of disjoint products (SDP) approach. This paper provides a general framework for most of these techniques. It reviews methods that help im ... Full text Cite

Steady state analysis of markov regenerative SPN with age memory policy

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1995 Non-Markovian Stochastic Petri Nets (SPN) have been developed as a tool to deal with systems characterized by non exponentially distributed timed events. Recently, some effort has been devoted to the study of SPN with generally distributed firing times, wh ... Full text Cite

TRANSIENT ANALYSIS OF REAL-TIME SYSTEMS USING DETERMINISTIC AND STOCHASTIC PETRI NETS

Conference QUALITY OF COMMUNICATION-BASED SYSTEMS · January 1, 1995 Link to item Cite

INTRODUCTION TO THE SPECIAL ISSUE ON FAULT-TOLERANT COMPUTING

Journal Article IEEE TRANSACTIONS ON COMPUTERS · 1995 Cite

Approximate computation of sojourn time distribution in open queueing networks

Conference COMPUTATIONS WITH MARKOV CHAINS · January 1, 1995 Link to item Cite

Transient analysis of the leaky bucket rate control scheme under poisson and on-off sources

Journal Article Proceedings - IEEE INFOCOM · December 1, 1994 In this paper we derive expressions for the time-dependent state probabilities and the time-averaged state-probabilities for the leaky bucket rate control scheme. Our model is based on the theory of Markov regenerative processes. Our results specialize to ... Cite

Guarded repair of dependable systems

Journal Article Theoretical Computer Science · June 6, 1994 Imperfect coverage and nonnegligible reconfiguration delay are known to have a deleterious effect on the dependability and the performance of a multiprocessor system. In particular, increasing the number of processor elements does not always increase depen ... Full text Cite

Numerical computation of response time distributions using stochastic reward nets

Journal Article Annals of Operations Research · April 1, 1994 We consider the numerical computation of response time distributions for closed product form queueing networks using the tagged customer approach. We map this problem on to the computation of the time to absorption distribution of a finite-state continuous ... Full text Cite

Coverage Evaluation Through Fault Injection: Fault Sampling and Statistical Analysis

Conference 3rd IEEE International Workshop on Integrating Error Models with Fault Injection, WIEM 1994 · January 1, 1994 Full text Cite

A stochastic reward net model for dependability analysis of real-time computing systems

Conference Proceedings of 2nd IEEE Workshop on Real-Time Applications, RTA 1994 · January 1, 1994 Dependability assessment plays an important role in the design and validation of fault-tolerant real-lime computer systems. Dependability models provide measures such as reliability, safety and mean time to failure as functions of the component failure rat ... Full text Cite

Reliability Analysis of the Double Counter-Rotating Ring with Concentrator Attachments

Journal Article IEEE/ACM Transactions on Networking · January 1, 1994 The inherently weak reliability behavior of the ring architecture has led network designers to consider various design choices to improve network reliability. In this paper, we assess the impact of provisions such as node bypass, secondary ring and concent ... Full text Cite

Reliability Modeling of Life-Critical, Real-Time Systems

Journal Article Proceedings of the IEEE · January 1, 1994 In this paper, we discuss the role of modeling in the design and validation of life-critical, real time systems. The basics of Markov, Markov reward, and stochastic reward net models are covered. An example of a nuclear power plant cooling system is develo ... Full text Cite

Power-Hierarchy of Dependability-Model Types

Journal Article IEEE Transactions on Reliability · January 1, 1994 This paper formally establishes a hierarchy, among the most commonly used types of dependability models, according to their modeling power. Among the combinatorial (non-state-space) model types, we show that fault trees with repeated events are the most po ... Full text Cite

Phased-mission system analysis using boolean algebraic methods

Journal Article Performance Evaluation Review · January 1, 1994 Most reliability analysis techniques and tools assume that a system is used for a mission consisting of a single phase. However, multiple phases are natural in many missions. The failure rates of components, system configuration, and success criteria may v ... Full text Cite

A Combinatorial Algorithm for Performance and Reliability Analysis Using Multistate Models

Journal Article IEEE Transactions on Computers · January 1, 1994 The need for the combined performance and reliability analysis of fault tolerant systems is increasing. The common approach to formulating and solving such problems is to use (semi-)Markov reward models. However, the large size of size of state spaces is a ... Full text Cite

Markov regenerative stochastic Petri nets

Journal Article Performance Evaluation · January 1, 1994 Stochastic Petri nets of various types (SPN, GSPN, ESPN, DSPN etc.) are recognized as useful modeling tools for analyzing the performance and reliability of systems. The analysis of such Petri nets proceeds by utilizing the underlying continuous-time stoch ... Full text Cite

Stiffness-tolerant methods for transient analysis of stiff Markov chains

Journal Article Microelectronics Reliability · January 1, 1994 Three methods for numerical transient analysis of Markov chains, the modified Jensen's method (Jensen's method with steady-state detection of the underlying DTMC and computation of Poisson probabilities using the method of Fox and Glynn [1]), a third-order ... Full text Cite

Markov reward approach to performability and reliability analysis

Journal Article Proceedings of the IEEE International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems · January 1, 1994 Performability and reliability modeling techniques and tools have been an area of intensive research activity in the last ten years. We present a unified mathematical framework for performability and reliability models in terms of Markov reward models. The ... Cite

Impact of fault expansion on the interval estimate for fault detection coverage

Journal Article Digest of Papers - International Symposium on Fault-Tolerant Computing · January 1, 1994 A high fault detection coverage is very critical for systems with ultra-safe requirements and fault injection is an effective technique for estimating the coverage. One difficulty of fault injection lies in the huge number of injections that need to be car ... Cite

Techniques and tools for reliability and performance evaluation: Problems and perspectives

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1994 Modelling techniques and tools of the future must meet the challenges presented by today's highly demanding and schedule-oriented developing environment. With the emergence of high performance and reliability systems the problem of how to analyze such syst ... Full text Cite

Analytic treatment of the reliability and performance of mirrored disk subsystems

Journal Article Digest of Papers - International Symposium on Fault-Tolerant Computing · December 1, 1993 An analytic model is developed for predicting the performance and reliability of mirrored disk subsystems. The model includes Markovian dependencies in the request stream, read and write traffic, unit failures, and individual request failures necessitating ... Cite

Approach for combinatorial performance and availability analysis

Journal Article · December 1, 1993 The common approach to formulating and solving combined reliability/availability and performance problems is to use Markov reward models. However, the large size of state spaces is a problem that plagues Markovian models. Combinatorial models have been use ... Cite

Specification techniques for Markov reward models

Journal Article Discrete Event Dynamic Systems: Theory and Applications · July 1, 1993 Markov reward models (MRMs) are commonly used for the performance, dependability, and performability analysis of computer and communication systems. Many papers have addressed solution techniques for MRMs. Far less attention has been paid to the specificat ... Full text Cite

On the sensitivity of transient solutions of Markov models

Journal Article Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1993 · June 1, 1993 We consider the sensitivity of transient solutions of Markov models to perturbations in their generator matrices. The perturbations can either be of a certain structure or can be very general. We consider two different measures of sensitivity and derive up ... Full text Cite

Sensitivity analysis of Markov regenerative stochastic Petri nets

Conference Proceedings of 5th International Workshop on Petri Nets and Performance Models, PNPM 1993 · January 1, 1993 Sensitivity analysis, i.e., the analysis of the effect of small variations in system parameters on the output measures, can be studied by computing the derivatives of the output measures with respect to the parameter. An algorithm for parametric sensitivit ... Full text Cite

A methodology for formal expression of hierarchy in model solution

Conference Proceedings of 5th International Workshop on Petri Nets and Performance Models, PNPM 1993 · January 1, 1993 A methodology for formal specification of hierarchy both in model specification and model solution is presented. Hierarchy is allowed to exist among different model types used in performance and dependability modeling. This offers a lot of flexibility and ... Full text Cite

The Completion Time of Programs on Processors Subject to Failure and Repair

Journal Article IEEE Transactions on Computers · January 1, 1993 The objective of this paper is to describe a technique for computing the distribution of the completion time of a program on a server subject to failure and repair. Several realistic aspects of the system are included in the model. The server behavior is m ... Full text Cite

Modeling Correlation in Software Recovery Blocks

Journal Article IEEE Transactions on Software Engineering · January 1, 1993 This paper considers the problem of accurately modeling the software fault-tolerance technique based on recovery blocks. Models of such systems have been criticized for their assumptions of independence. Analysis of some systems have considered the correla ... Full text Cite

A Software Tool for Learning About Stochastic Models

Journal Article IEEE Transactions on Education · January 1, 1993 The study of stochastic modeling can be greatly enriched by the use of computer software. Such software should enable students to experiment with modeling techniques, check their understanding of algorithms for model analysis, develop the skills and “intui ... Full text Cite

Reliability analysis of various station attachment schemes in a FDDI token ring

Journal Article Proceedings - IEEE INFOCOM · January 1, 1993 Five different attachment schemes proposed for the FDDI token ring are compared in terms of reliability. For this purpose, the topologies are first studied in isolation (reliability of the path to the backbone) and subsequently end-to-end user reliabilitie ... Cite

Multiprocessor Performability Analysis

Journal Article IEEE Transactions on Reliability · January 1, 1993 Performability models of multiprocessor systems and their evaluation are presented. Two cases in which hierarchical modeling is applied are examined. 1. Models are developed to analyze the behavior of processor arrays of various sizes in the presence of pe ... Full text Cite

Performance Evaluation of Client-Server Systems

Journal Article IEEE Transactions on Parallel and Distributed Systems · January 1, 1993 A client-server system is a distributed system where a server station receives requests from its client stations, processes the requests and returns replies to the requesting stations. In this paper, client-server systems in which a set of workstations acc ... Full text Cite

A decomposition approach for stochastic reward net models

Journal Article Performance Evaluation · January 1, 1993 We present a decomposition approach for the solution of large stochastic reward nets (SRNs) based on the concept of near-independence. The overall model consists of a set of submodels whose interactions are described by an import graph. Each node of the gr ... Full text Cite

Reliability analysis of redundant arrays of inexpensive disks

Journal Article Journal of Parallel and Distributed Computing · January 1, 1993 A reliability analysis of various disk array architectures (different levels of RAID) is performed. The dependence of reliability and mean time to data loss on various parameters of a disk array is characterized. A study of these characteristics reveals th ... Full text Cite

Approximate analysis of priority scheduling systems using stochastic reward nets

Journal Article Proceedings - International Conference on Distributed Computing Systems · January 1, 1993 We present a performance analysis of a heterogeneous multiprocessor system where tasks may arrive from Poisson sources as well as by spawning and probabilistic branching of other tasks. Non-preemptive priority scheduling is used between different tasks. We ... Cite

Conditional MTTF and its computation in Markov reliability models

Journal Article Proceedings of the Annual Reliability and Maintainability Symposium · January 1, 1993 Mean time to failure (MTTF) is one of the most frequently used dependability measures in practice. MTTF is the expected time for a system to reach the predefined failure states due to any of the failure causes. If system failures are classified into differ ... Full text Cite

Transient analysis of deterministic and stochastic petri nets

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993 Deterministic and stochastic Petri nets (DSPNs) are recognized as a useful modeling technique because of their capability to represent constant delays which appear in many practical systems. If at most one deterministic transition is allowed to be enabled ... Full text Cite

Integration of specification for modeling and specification for system design

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993 This paper presents a procedure of transforming an Estelle specification into Stochastic Reward Net (SRN) formalism. Estelle is an ISO standard formal specification language which can help avoid ambiguity, incompleteness and inconsistency in system develop ... Full text Cite

FSPNs: Fluid stochastic petri nets

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993 In this paper we introduce a new class of stochastic Petri nets in which one or more places can hold fluid rather than discrete tokens. After defining the class of fluid stochastic Petri nets, we provide equations for their transient and steady-state behav ... Full text Cite

MODELING USING STOCHASTIC REWARD NETS

Conference MASCOTS '93 · 1993 Cite

Dependability and performability analysis

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993 In this tutorial, we discuss several practical issues regarding specification and solution of dependability and performability models. We compare model types with and without rewards. Continuous-time Markov chains (CTMCs) are compared with (continuous-time ... Full text Cite

Approximate performance models of polling systems using stochastic Petri nets

Journal Article Proceedings - IEEE INFOCOM · December 1, 1992 The performance of a polling system is modeled by stochastic Petri nets and its analysis is done by numerically solving the underlying Markov chain. One key problem in using stochastic Petri nets for real applications is that the size of underlying Markov ... Full text Cite

MEASUREMENT AND ANALYSIS OF PARALLEL AND DISTRIBUTED SYSTEMS - INTRODUCTION

Journal Article IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS · November 1, 1992 Link to item Cite

Analyzing concurrent and fault-tolerant software using stochastic reward nets

Journal Article Journal of Parallel and Distributed Computing · January 1, 1992 We present two software applications and develop models for them. The first application considers a producer-consumer tasking system with an intermediate buffer task and studies how the performance is affected by different selection policies when multiple ... Full text Cite

A unified performance reliability analysis of a system with a cumulative down time constraint

Journal Article Microelectronics Reliability · January 1, 1992 We discuss unified performance and reliability analysis of a system which operates in a critical environment, in the sense that a catastrophic condition is reached when the accumulated down time exceeds a given threshold. Assuming that the system must proc ... Full text Cite

Composite performance and dependability analysis

Journal Article Performance Evaluation · January 1, 1992 Composite performance and dependability analysis is gaining importance in the design of complex, fault-tolerant systems. Markov reward models are most commonly used for this purpose. In this paper, an introduction to Markov reward models including solution ... Full text Cite

Guest Editors' Introduction.

Journal Article IEEE Trans. Parallel Distributed Syst. · 1992 Full text Cite

A TOOLCHEST FOR STOCHASTIC-MODELS

Conference INTERNATIONAL CONFERENCE ON SIMULATION IN ENGINEERING EDUCATION · 1992 Cite

Stochastic petri net analysis of finite-population vacation queueing systems

Journal Article Queueing Systems · December 1, 1991 We consider queueing systems in which the server occasionally takes a vacation of random duration. The vacation can be used to do additional work; it can also be a rest period. Several models of this problem have been analyzed in the past assuming that the ... Full text Cite

On the solution of GSPN reward models

Journal Article Performance Evaluation · January 1, 1991 We extend the basic GSPN (generalized stochastic Petri net) model to the GSPN-reward model. This allows the concise specification of both the underlying stochastic process and the rewards attached to the states and the transitions of the stochastic process ... Full text Cite

Dependability modeling for computer systems

Journal Article Proceedings of the Annual Reliability and Maintainability Symposium · January 1, 1991 A computer system dependability analysis that ties together concepts such as reliability, maintainability and availability is discussed. Three classes of dependability measures are described: system availability, system reliability, and task completion. Us ... Cite

Reliability modeling of the MARS system: A case study in the use of different tools and techniques

Conference Proceedings of the 4th International Workshop on Petri Nets and Performance Models, PNPM 1991 · January 1, 1991 Analytical reliability modeling is a promising method for predicting the reliability of different architectural variants and to perform trade-off studies at design time. However, generating a computationally tractable analytic model implies in general an a ... Full text Cite

A decomposition approach for stochastic Petri net models

Conference Proceedings of the 4th International Workshop on Petri Nets and Performance Models, PNPM 1991 · January 1, 1991 We present a decomposition approach for the solution of large stochastic Petri nets (SPNs). The overall model consists of a set of submodels whose interactions are described by an import graph. Each node of the graph corresponds to a parametrized SPN submo ... Full text Cite

Fixed Point Iteration in Availability Modeling.

Conference Fault-Tolerant Computing Systems · 1991 Cite

SPNP - THE STOCHASTIC PETRI NET PACKAGE

Conference NUMERICAL SOLUTION OF MARKOV CHAINS · 1991 Cite

SOLUTION OF LARGE GSPN MODELS

Conference NUMERICAL SOLUTION OF MARKOV CHAINS · 1991 Cite

SHARPE - SYMBOLIC HIERARCHICAL AUTOMATED RELIABILITY AND PERFORMANCE EVALUATOR

Conference NUMERICAL SOLUTION OF MARKOV CHAINS · 1991 Cite

DEPENDABILITY MODELING FOR COMPUTER-SYSTEMS

Conference PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM · 1991 Cite

An Improved Algorithm for Symbolic Reliability Analysis

Journal Article IEEE Transactions on Reliability · January 1, 1991 The purpose of this paper is to describe an efficient Boolean algebraic algorithm to compute the probability of a union of non-disjoint sets as applied to symbolic reliability analysis. Coherent networks and fault-trees with statistically-independent compo ... Full text Cite

Reliability analysis of the FDDI token ring

Conference Proceedings - Conference on Local Computer Networks, LCN · January 1, 1991 In this paper we develop reliability models and derive closed-form results for network reliability and network mean time to failure, including both node and link failures, for a very popular high speed LAN, the FDDI (Fiber Distributed Data Interface). We t ... Full text Cite

An improved algorithm for the symbolic reliability analysis of networks

Journal Article Proceedings - Symposium on Reliability in Distributed Software and Database Systems · December 1, 1990 An efficient Boolean algebraic algorithm for the symbolic reliability and sensitivity analysis of coherent two-terminal networks with s independent components is described. The algorithm is also applicable to a fault tree model without NOT gates. The algor ... Cite

Two queues with alternating service and server breakdown

Journal Article Queueing Systems · September 1, 1990 We consider a queueing system with two stations served by a single server in a cyclic manner. We assume that at most one customer can be served at a station when the server arrives at the station. The system is subject to service interuption that arises fr ... Full text Cite

GSPN Models: Sensitivity analysis and applications

Conference Proceedings - 28th Annual Southeast Regional Conference, ACM-SE 1990 · April 1, 1990 Sensitivity analysis of continuous time Markov chains has been considered recently by several researchers. This is very useful in performing bottleneck analysis and optimization on systems especially during the design stage. However the construction of the ... Cite

Computation of the distribution of the completion time when the work requirement is a ph random variable

Journal Article Communications in Statistics. Stochastic Models · January 1, 1990 In this paper we derive the distribution of the completion time of a job with a PH-distributed work requirement, on a server modeled by a homogeneous Markov reward process. The interactions between the job in progress and the server are allowed to be eithe ... Full text Cite

Effects of checkpointing and queueing on program performance

Journal Article Communications in Statistics. Stochastic Models · January 1, 1990 Checkpointing is a technique for reducing the completion (execution) time of long-running batch programs in the presence of failures. It consists of intermittently saving the current status of the program under execution so that if a failure occurs, the pr ... Full text Cite

Computing Cumulative Measures of Stiff Markov Chains Using Aggregation

Journal Article IEEE Transactions on Computers · January 1, 1990 We present an aggregation method for the computation of transient cumulative measures of large, stiff Markov models. The method is based on the classification of the states of the original problem into slow, fast transient, and fast recurrent states. We ag ... Full text Cite

Performability Analysis Using Semi-Markov Reward Processes

Journal Article IEEE Transactions on Computers · January 1, 1990 With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to considerable interest ... Full text Cite

Stochastic Petri Net Models of Polling Systems

Journal Article IEEE Journal on Selected Areas in Communications · January 1, 1990 We consider finite-population and finite-capacity polling systems. The behavior of these systems is described by means of generalized stochastic Petri nets. The exact results for the mean response times are obtained numerically by means of the stochastic P ... Full text Cite

System performance with user behavior graphs

Journal Article Performance Evaluation · January 1, 1990 Workload characterization is known to be a difficult and yet a very important facet of performance modeling. User behavior graphs have been advocated as a practical means of workload characterization. Performance modeling with user behavior graphs is for t ... Full text Cite

Should I add a processor?

Journal Article Proceedings of the Hawaii International Conference on System Science · January 1, 1990 A model is developed demonstrating that when availability is the only measure of system effectiveness of interest, even a small reconfiguration delay leads to a violation of the monotonic increase in availability with the number of processors. A measure of ... Cite

Availability and Reliability Modeling for Computer Systems

Journal Article Advances in Computers · January 1, 1990 Dependability calculates the capability of a product to deliver its intended level of service to the user, especially in light of failures or other incidents that impinge on its performance, and combines various underlying ideas, such as reliability, maint ... Full text Cite

GSPM models: sensitivity analysis and applications.

Conference ACM Southeast Regional Conference · 1990 Cite

Stochastic Petri net modeling of VAXcluster system availability

Journal Article · December 1, 1989 A VAXcluster is a closely coupled multicomputer system that consists of two or more VAX computers, one or more hierarchical storage controllers (HSCs), two or more disks, and a star coupler. The Markov model of VAX cluster system availability suffers from ... Cite

Completion time of programs on concurrent processors with failure and repair

Journal Article Proceedings of the International Conference on Parallel Processing · December 1, 1989 The authors address the problem of the completion time of a job structured as a directed acyclic graph processed on parallel processors, i.e., programs consisting of precedence-constrained tasks. The processors are allowed to be subject to failure and repa ... Cite

Transient overloads in fault-tolerant real-time systems

Journal Article Proceedings - Real-Time Systems Symposium · December 1, 1989 A novel technique that allows a single system to guarantee the execution of both periodic and aperiodic tasks within hard deadlines is presented. The approach is based on dynamically changing the replication factor of periodic tasks in response to aperiodi ... Cite

SPNP: Stochastic Petri Net Package

Journal Article · December 1, 1989 SPNP, a powerful GSPN package that allows the modeling of complex system behaviors, is presented. Advanced constructs are available in SPNP such as marking-dependent arc multiplicities, enabling functions, arrays of places or transitions, and subnets; the ... Cite

On reliability modelling of fault-tolerant distributed systems

Journal Article Proceedings - International Conference on Distributed Computing Systems · June 1, 1989 The problem of predicting the reliability of a distributed system based on the principles of Byzantine agreement is addressed. The system is considered inoperable or failed if Byzantine agreement cannot be guaranteed. The reliability models depend on a uni ... Cite

Markov and Markov reward model transient analysis: An overview of numerical approaches

Journal Article European Journal of Operational Research · May 25, 1989 The advent of fault-tolerant, distributed systems has led to increased interest in analytic techniques for the prediction of reliability, availability, and combined performance and reliability measures. Markov and Markov reward models are common tools for ... Full text Cite

Analysis of Stiff Markov Chains

Journal Article ORSA Journal on Computing · May 1989 Continuous-time Markov chains (CTMC) are widely used mathematical models. Reliability models, queueing networks, and inventory models all require transient solutions of CTMC. The cost of CTMC transient solution increases with size, stiffness, and ... Full text Cite

Message from the chair

Journal Article Applied Industrial Hygiene · January 1, 1989 Full text Cite

Transient analysis of cumulative measures of markov model behavior

Journal Article Communications in Statistics. Stochastic Models · January 1, 1989 Markov chains and Markov reward models provide are useful for modeling fault-tolerant, distributed and multi-processor systems. In this paper, we consider the transient analysis of “cumulative” or “integral” measures of Markov and Markov reward model behav ... Full text Cite

Coverage Modeling for Dependability Analysis of Faulttolerant Systems

Journal Article IEEE Transactions on Computers · January 1, 1989 Several different models for predicting coverage in a faulttolerant system are discussed, including models for permanent, intermittent, and transient errors. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing ... Full text Cite

Multistage Interconnection Network Reliability

Journal Article IEEE Transactions on Computers · January 1, 1989 In this paper, we examine the reliability of an unique-path multistage interconnection network (MIN) and a faulttolerant scheme aimed at improving system reliability. We derive closed-form expressions for the time-dependent reliability of the 8×8 and 16 × ... Full text Cite

Dependability evaluation of a class of multi-loop topologies for local area networks

Journal Article IBM Journal of Research and Development · January 1, 1989 Local area networks have been developed using both ring and bus topologies. Multi-loop and multi-connected topologies have been proposed to improve the throughput and dependability of single-loop networks. We evaluate the dependability of a class multi-con ... Full text Cite

Approximate Availability Analysis of VAXcluster Systems

Journal Article IEEE Transactions on Reliability · January 1, 1989 We solve for the availability of an n-processor VAXcluster system using a hierarchical approach that allows us to: 1) obtain a closed-form answer to an apparently difficult problem, and 2) determine the optimal number of processors in the cluster for a giv ... Full text Cite

Reliability Analysis of Interconnection Networks Using Hierarchical Composition

Journal Article IEEE Transactions on Reliability · January 1, 1989 Based on the nature of the upper-and lower-bound block diagram models of Multistage Interconnection Networks (MINs), we generalize and consider a series system consisting of independent subsystems. In order to model the reliability of such a system with On ... Full text Cite

Markov reliability models for digital flight control systems

Journal Article Journal of Guidance, Control, and Dynamics · January 1, 1989 The reliability of digital flight control systems can often be accurately predicted! using Markov chain models. We begin our discussion of flight control system reliability models with definitions of key terms. We then construct a single-fault one-processo ... Full text Cite

Applications of the hybrid automated reliability predictor

Journal Article · December 1, 1988 The Hybrid Automated Reliability Predictor (HARP) is a software package that implements advanced reliability modeling techniques. We present an overview of some of the problems that arise in modeling highly reliable, fault tolerant systems, loosely divided ... Cite

Reliabilities of two fault-tolerant interconnection networks.

Journal Article Digest of Papers - FTCS (Fault-Tolerant Computing Symposium) · December 1, 1988 The authors examine the augmented-shuffle-exchange network (ASEN), a network with low switch and link complexity. Using exact reliability expressions for small networks and upper and lower bounds for larger networks, the reliability of the ASEN is compared ... Cite

Sensitivity analysis of reliability and performability measures for multiprocessor systems

Journal Article Perform. Eval. Rev. (USA) · 1988 Traditional evaluation techniques for multiprocessor systems use Markov chains and Markov reward models to compute measures such as mean time to failure, reliability, performance, and performability. In this paper, the authors discuss the extension of Mark ... Cite

Numerical transient analysis of markov models

Journal Article Computers and Operations Research · January 1, 1988 We consider the numerical evaluation of Markov model transient behavior. Our research is motivated primarily by computer system dependability modeling. Other application areas include finitecapacity queueing models, closed queueing networks and inventory m ... Full text Cite

Performability Analysis: Measures, an Algorithm, and a Case Study

Journal Article IEEE Transactions on Computers · January 1, 1988 Multiprocessor systems can provide higher performance and higher reliability/availability than single-processor systems. In order to properly assess the effectiveness of multiprocessor systems, measures that combine performance and reliability are needed. ... Full text Cite

Performability Modeling Based on Real Data: A Case Study

Journal Article IEEE Transactions on Computers · January 1, 1988 This paper describes a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Mo ... Full text Cite

The use of Weibull fault processes in modeling fault tolerant systems

Journal Article Journal of Guidance, Control, and Dynamics · January 1, 1988 Full text Cite

RELIABILITY OF THE SHUFFLE-EXCHANGE NETWORK AND ITS VARIANTS.

Journal Article Proceedings of the Hawaii International Conference on System Science · January 1, 1988 The authors consider the reliability of the shuffle-exchange multistage interconnection network (SEN) and two variations of this network aimed at improving reliability through fault tolerance. The two variations are the SEN with an extra stage and the redu ... Full text Cite

APPLICATIONS OF THE HYBRID AUTOMATED RELIABILITY PREDICTOR.

Journal Article · December 1, 1987 The Hybrid Automated Reliability Predictor (HARP) is a software package that implements advanced reliability modling techniques. In this paper we present an overview of some of the problems that arise in modeling highly reliable, fault tolerant systems, lo ... Cite

Probabilistic modeling of computer system availability

Journal Article Annals of Operations Research · December 1, 1987 System availability is becoming an increasingly important factor in evaluating the behavior of commercial computer systems. This is due to the increased dependence of enterprises on continuously operating computer systems and to the emphasis on fault-toler ... Full text Cite

The completion time of a job on multimode systems

Journal Article Advances in Applied Probability · December 1987 In this paper we present a general model of the completion time of a single job on a computer system whose state changes according to a semi-Markov process with possibly infinite state-space. When the state of the system changes the job service is ... Full text Cite

Computer-aided reliability analysis of fault-tolerant systems

Journal Article Sadhana · October 1, 1987 We present an overview of the major problems inherent in reliability modelling of fault-tolerant systems. The problems faced while modelling such systems include the need to consider a very large state space, non-exponential distributions, error analysis, ... Full text Cite

A note on the effect of preemptive policies on the stability of a priority queue

Journal Article Information Processing Letters · April 6, 1987 We study the stability condition of an M/G/1 priority queue with two classes of jobs. Class 1 jobs have preemptive priority over class 2 jobs. We consider three different types of preemptions and the effects of possible work loss (due to preemption) on the ... Full text Cite

Queueing Analysis of Faulttolerant Computer Systems

Journal Article IEEE Transactions on Software Engineering · January 1, 1987 In this paper we consider the queueing analysis of a faulttolerant computer system. The failure/repair behavior of the server is modeled by an irreducible continuous-time Markov chain. Jobs arrive in a Poisson fashion to the system and are serviced accordi ... Full text Cite

Reliability Modeling using Sharpe

Journal Article IEEE Transactions on Reliability · January 1, 1987 Conclusions-Combinatorial models such as fault trees and reliability block diagrams are efficient for model specification and often efficient in their evaluation. But it is difficult, if not impossible, to allow for dependencies (such as repair dependency ... Full text Cite

Analysis of Typical Fault-Tolerant Architectures using Harp

Journal Article IEEE Transactions on Reliability · January 1, 1987 Conclusions-HARP (the Hybrid Automated Reliability Predictor) is a software package that implements acvanced reliability modeling techniques. We present an overview of some of the problems that arise in modeling highly reliable fault-tolerant systems; the ... Full text Cite

PERFORMABILITY ANALYSIS OF TWO MULTI-PROCESSOR SYSTEMS.

Journal Article Digest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1987 The authors describe the behavior of a multiprocessor system as a continuous-time Markov chain and associate a reward rate (performance measure) with each state. They evaluate the distribution of performability for analytical models of two multiprocessor s ... Cite

Transient analysis of acyclic markov chains

Journal Article Performance Evaluation · January 1, 1987 Continuous-time Markov chains are commonly used insystem reliability modeling. In this paper, we discuss a method for automatically deriving transient solutions that are symbolic in t for acyclic Markov chains. Our method also includes parametric sensitivi ... Full text Cite

Reliability of Systems with Limited Repairs

Journal Article IEEE Transactions on Reliability · January 1, 1987 Conclusions-Reliability is the probability that a system functions according to specifications over a given period of time. During this period, system specifications may allow failures and repairs to occur. This paper considers systems with specifications ... Full text Cite

Performance and Reliability Analysis Using Directed Acyclic Graphs

Journal Article IEEE Transactions on Software Engineering · January 1, 1987 A graph-based modeling technique has been developed for the stochastic analysis of systems containing concurrency. The basis of the technique is the use of directed acyclic graphs. These graphs represent event-precedence networks where activities may occur ... Full text Cite

Transient Analysis of Markov and Markov Reward Models.

Conference Computer Performance and Reliability · 1987 Cite

A Measurement-Based Performability Model for a Multiprocessor System.

Conference Computer Performance and Reliability · 1987 Cite

HIERARCHICAL, COMBINATORIAL-MARKOV METHOD OF SOLVING COMPLEX RELIABILITY MODELS.

Journal Article · December 1, 1986 Combinatorial models such as fault-trees and reliability block diagrams are efficient in both specification and evaluation of system models, but it is difficult if not impossible to allow for various types of dependency, transient and intermittent faults, ... Cite

The reliability of life-critical computer systems

Journal Article Acta Informatica · November 1, 1986 In order to aid the designers of life-critical, fault-tolerant computing systems, accurate and efficient methods for reliability prediction are needed. The accuracy requirement implies the need to model the system in great detail, and hence the need to add ... Full text Cite

The hybrid automated reliability predictor

Journal Article Journal of Guidance, Control, and Dynamics · January 1, 1986 In this paper, we present an overview of the hybrid automated reliability predictor (HARP), under development at Duke and Clemson Universities. The HARP approach to reliability prediction is characterized by a decomposition of the overall model into distin ... Full text Cite

AGGREGATION TECHNIQUE FOR THE TRANSIENT ANALYSIS OF STIFF MARKOV CHAINS.

Journal Article IEEE Transactions on Computers · January 1, 1986 An approximation algorithm for systematically converting a stiff Markov chain into a nonstiff chain with a smaller state space is described. After classifying the set of all states into fast and slow states, the algorithm proceeds by further classifying fa ... Full text Cite

On modelling the performance and reliability of multimode computer systems

Journal Article The Journal of Systems and Software · January 1, 1986 We present an effective technique for the combined performance and reliability analysis of multimode computer systems. A reward rate (or a performance level) is associated with each mode of operation. The switching between different modes is characterized ... Full text Cite

QUEUEING ANALYSIS OF FAULT-TOLERANT COMPUTER SYSTEMS.

Journal Article Performance Evaluation Review · January 1, 1986 Queueing models provide a useful tool for predicting the performance of many service systems including computer systems, telecommunication systems, computer/communication networks and flexible manufacturing systems. Traditional queueing models predict syst ... Full text Cite

PROBABLY CONSERVATIVE APPROXIMATIONS TO COMPLEX RELIABILITY MODELS.

Journal Article IEEE Transactions on Computers · January 1, 1986 Provably conservative (and optimistic) reliability models can be systematically derived from more complex models. These derived models incorporate a reduced state space and fewer transitions and, therefore, have solutions that are more cost-effective than ... Full text Cite

DEPENDABILITY PREDICTION: COMPARISON OF TOOLS AND TECHNIQUES.

Journal Article IFAC Proceedings Series · January 1, 1986 Dependability measures (such as reliability, mean time to failure, availability) are important criteria for the design of computer-based applications, as well as for their validation. In this paper important techniques for dependability modeling are discus ... Full text Cite

NUMERICAL EVALUATION OF PERFORMABILITY AND JOB COMPLETION TIME IN REPAIRABLE FAULT-TOLERANT SYSTEMS.

Journal Article Digest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1986 Fault-tolerant computer systems change their level of performance (e. g, mode of operation or service rate) in response to different events such as failure, degradation or repair. The authors present a unified model for the analysis of job (task) completio ... Cite

SYSTEM AVAILABILITY ESTIMATOR.

Journal Article Digest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1986 The system availability estimator (SAVE) program package is described that can be used for constructing and solving probabilistic models of computer system availability and reliability. SAVE is a state-of-the-art tool intended for use during system design ... Cite

PERFORMANCE ANALYSIS USING USER BEHAVIOR GRAPHS

Conference 12th International Computer Measurement Group Conference, CMG 1986 · January 1, 1986 The vser behavior graph is a graphical model for describing the behavior of the interactive users. The adequacy of user behavior graphs in several performance evaluation studies for workload characterization is investigated. In absence of memory constraint ... Cite

DESIGN OF A UNIFIED PACKAGE FOR THE SOLUTION OF STOCHASTIC PETRI NET MODELS.

Journal Article · December 1, 1985 A description is given of the philosophical differences between three current SPN models in an attempt to merge the most important (and noncontradictory) aspects into one. This work previews the design of a package for the solution of this unified model. ... Cite

EXTENDED STOCHASTIC PETRI NETS: APPLICATIONS AND ANALYSIS.

Journal Article · December 1, 1985 An Extended Stochastic Petri Net (ESPN) model, useful for modeling systems which exhibit concurrent, asynchronous, or nondeterministic behavior is developed. Applications demonstrating the flexibility of the model for a variety of system modeling applicati ... Cite

The Conservativeness of Reliability Estimates Based on Instantaneous Coverage

Journal Article IEEE Transactions on Computers · January 1, 1985 In order to remain tractable, many reliabilitymod-els do not include the states-and transitions necessary to represent fault/error-handling details. Instead, the effectiveness of fault/ error-handling mechanisms is represented by the use ofinstanta-neous c ... Full text Cite

A single server queue in a hard-real-time environment

Journal Article Operations Research Letters · January 1, 1985 We consider a single server first in first out queue in which each arriving task has to be completed within a certain period of time (its deadline). More precisely, each arriving task has its own deadline - a non-negative real number - and as soon as the r ... Full text Cite

HYBRID MODELING TECHNIQUES AND THEIR APPLICATION TO FAULT-TOLERANT COMPUTER SYSTEMS.

Journal Article Modeling and Simulation, Proceedings of the Annual Pittsburgh Conference · December 1, 1984 We detail the use of behavioral decomposition in modeling systems that contain state transitions having widely disparate time constants. We show that such decomposition leads naturally to hybrid system models, containing both analytic and simulative submod ... Cite

RELIABILITY EVALUATION FOR FAULT-TOLERANT SYSTEMS.

Journal Article · December 1, 1984 Important problems that arise in modeling highly-reliable fault-tolerant systems are discussed. First, reliability models of such systems possess a large number of states, making the solution computationally intractable. This leads to the need for decompos ... Cite

COMPUTER SYSTEMS ANALYSIS.

Journal Article Modeling and Simulation, Proceedings of the Annual Pittsburgh Conference · December 1, 1984 The need for increased reliability and computing power coupled with advances in technology has given rise to sizeable and complex computer systems. The users and designers of such systems need tools to evaluate the effectiveness of such systems. Current ap ... Cite

ON MODELLING THE PERFORMANCE AND RELIABILITY OF MULTIMODE COMPUTER SYSTEMS.

Journal Article Journal of Systems and Software · May 1, 1984 We present an effective technique for the combined performance and reliability analysis of multimode computer systems. A reward rate (or a performance level) is associated with each mode of operation. The switching between different modes is characterized ... Cite

Ergonomics in india: A case study on workspace design for an alphacomp phototypesetting machine

Journal Article Behaviour and Information Technology · January 1, 1984 Ergonomics in India is a newly emerging discipline-having made inroads to the people of India very recently. Most of the Indians are absolutely unaware of using ergonomics to achieve an efficient man-machine-environment system for better productivity with ... Full text Cite

LOAD DISTRIBUTION IN A STAR CONFIGURED SYSTEM WITH ERROR-PRONE CHANNELS

Journal Article MATEMATICA APLICADA E COMPUTACIONAL · January 1, 1984 Link to item Cite

Hybrid reliability modeling of fault-tolerant computer systems

Journal Article Computers and Electrical Engineering · January 1, 1984 Current technology allows sufficient redundancy in fault-tolerant computer systems to insure that the failure probability due to exhaustion of spares is low. Consequently, the major cause of failure is the inability to correctly detect, isolate, and reconf ... Full text Cite

MODELING IMPERFECT COVERAGE IN FAULT-TOLERANT SYSTEMS.

Journal Article Digest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1984 Cite

Issues in reliability modeling of fault-tolerant computers.

Conference Fehlertolerierende Rechensysteme · 1984 Cite

COMPUTER SYSTEMS ANALYSIS.

Journal Article · December 1, 1983 Cite

ANALYSIS OF COMPUTER PERFORMANCE AND RELIABILITY.

Journal Article · December 1, 1983 Cite

Task allocation in fault-tolerant distributed systems

Journal Article Acta Informatica · September 1, 1983 This paper examines task allocation in fault-tolerant distributed systems. The problem is formulated as a constrained sum of squares minimization problem. The computational complexity of this problem prompts us to consider an efficient approximation algori ... Full text Cite

Decomposition in Reliability Analysis of Fault-Tolerant Systems

Journal Article IEEE Transactions on Reliability · January 1, 1983 Summary & Conclusions:—Two important problems which arise in modeling fault-tolerant systems with ultra-high reliability requirements are discussed. 1) Any analytic model of such a system has a large number of states, making the solution computationally in ... Full text Cite

Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems

Journal Article IEEE Transactions on Computers · January 1, 1983 A review and a critical evaluation of a representative class of state-of-the-art models for ultrahigh reliability prediction is presented. This evaluation naturally leads us to a new model for ultrahigh reliability prediction now under development. The new ... Full text Cite

Analytic Queueing Models for Programs with Internal Concurrency

Journal Article IEEE Transactions on Computers · January 1, 1983 Analytic queueing models of programs with internal concurrency are considered. The program behavior model allows a process to spawn two or more concurrent tasks at some point during its execution. Except for queueing effects, the tasks execute independentl ... Full text Cite

Computer Science and Applied Probability (abstract).

Conference Int. CMG Conference · 1983 Cite

TASK AND FILE ALLOCATION IN FAULT-TOLERANT DISTRIBUTED SYSTEMS.

Journal Article Proceedings - Symposium on Reliability in Distributed Software and Database Systems · December 1, 1982 Task and file allocation are examined in two classes of fault-tolerant distributed systems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems, while the file allocation problem arises in Ethernet-like systems. B ... Cite

Optimal Design of Multilevel Storage Hierarchies

Journal Article IEEE Transactions on Computers · January 1, 1982 An optimization model is developed for assigning a fixed set of files across an assemblage of storage devices so as to maximize system throughput. Multiple levels of executable memories and distinct record sizes for separate files are allowed. Through the ... Full text Cite

Queueing network models for parallel processing with asynchronous tasks

Journal Article IEEE Transactions on Computers · January 1, 1982 Computer performance models of parallel processing systems in which a job subdivides into two or more tasks at some point during its execution are considered. Except for queueing effects, the tasks execute independently of one another and do not require sy ... Full text Cite

COMPUTER CONFIGURATION DESIGN TO MINIMIZE RESPONSE TIME.

Journal Article Computer performance · January 1, 1982 A computer configuration design problem is considered. The computer system is modelled as a closed queueing network. The average response time to an interactive user request is minimized and the speeds of the devices are the decision variables. A broad cla ... Cite

OPTIMAL FILE ALLOCATION, DEVICE CAPACITY AND CPU SPEED SELECTION DURING THE DESIGN OF INTERACTIVE COMPUTER SYSTEMS

Conference 8th International Computer Measurement Group Conference, CMG 1982 · January 1, 1982 This paper considers a computer configuration design problem. The computer is modeled as a closed queueinq network. The average resDonse time to an interactive user request is to be minimized. The decision variables are CPU speed, capacities of I/O devices ... Cite

TUTORIAL ON THE CARE III APPROACH TO RELIABILITY MODELING.

Journal Article NASA Contractor Reports · December 1, 1981 CARE III is a major departure from conventional approaches to reliability modeling in that it purports to support nonexponential distributions, while avoiding the problem of large state spaces through the use of state aggregation. More specifically, CARE I ... Cite

Optimal Design of Linear Storage Hierarchies

Journal Article Journal of the ACM (JACM) · April 1, 1981 The performance-oriented design of linear storage hierarchies whtch are operating m muluprogramming environments is considered An optimization model is superimposed upon an exponential queuing network model of the hierarchy, yielding a problem whose object ... Full text Cite

TUTORIAL ON THE CARE III APPROACH TO RELIABILITY MODELING.

Journal Article NASA Contractor Reports · January 1, 1981 CARE III is a major departure from conventional approaches to reliability modeling in that it purports to support nonexponential distributions, while avoiding the problem of large state spaces through the use of state aggregation. More specifically, CARE I ... Cite

Optimal Design of an Interactive System: File Allocation, Device Capacity Selection, and CPU Speed Selection

Conference 7th International Computer Measurement Group Conference, CMG 1981 · January 1, 1981 This paper considers a computer configuration design problem. The computer is modeled as a closed gueueing network. The average response time to an interactive user request is to be minimized. The decision variables are CPU speed, capacities of I/O devices ... Cite

Optimal Selection of CPU Speed, Device Capacities, and File Assignments

Journal Article Journal of the ACM (JACM) · July 1, 1980 This paper presents a computer system configuration design problem in which the objective is to select the CPU speed, the capacities of secondary storage devices, and the allocation of a set of files across the secondary storage devices so as to maximize t ... Full text Cite

Hardware configuration selection through discretizing a continuous variable solution

Conference Proceedings of the 1980 International Symposium on Computer Performance Modelling, Measurement and Evaluation, PERFORMANCE 1980 · May 28, 1980 This paper extends a previous model for computer system configuration planning developed by the authors. The problem is to optimally select the CPU speed, the device capacities, and file assignments so as to maximize throughput subject to a fixed cost cons ... Full text Cite

Designing linear storage hierarchies so as to maximize reliability subject to cost and performance constraints

Conference Proceedings - International Symposium on Computer Architecture · May 6, 1980 A geometric programming model is proposed to determine the optimal design of the CPU and its matching storage hierarchy. The objective function is the maximization of system reliability subject to performance and budgetary limitations. Examples illustratin ... Full text Cite

HARDWARE CONFIGURATION SELECTION THROUGH DISCRETIZING A CONTINUOUS VARIABLE SOLUTION.

Journal Article Performance Evaluation Review · January 1, 1980 A previous model for computer system configuration planning is extended. The problem is to optimally select the CPU speed, the device capacities, and file assignments so as to maximize throughput subject to a fixed cost constraint. This essentially discret ... Full text Cite

A Model for Computer Configuration Design

Journal Article Computer · January 1, 1980 Full text Cite

RELIABILITY VALIDATION OF SYSTEMS FOR LIFE-CRITICAL APPLICATIONS.

Journal Article EASCON Record: IEEE Electronics and Aerospace Systems Convention · January 1, 1980 Design requirements of systems used in life-critical applications result in the specification of extremely high levels of reliability. A case in point is the digital flight control system to be used in the future generation of aircraft. Traditional reliabi ... Cite

OPTIMAL SELECTION OF CPU SPEED, DEVICE CAPACITIES, AND ALLOCATION OF FILES WITH VARIABLE RECORD SIZE.

Journal Article National Bureau of Standards, Special Publication · January 1, 1980 This work extends a previous model for computer system configuration planning developed by the authors. The problem is to optimally select CPU speed, device capacities, and file assignments so as to maximize system throughput subject to a fixed cost constr ... Cite

An analysis of prepaging

Journal Article Computing · September 1, 1979 Prepaging is advocated as a technique to reduce the excessive page traffic due to the changes in the phases of execution of a program. Common prepaging techniques are surveyed. It is advocated that the phase transition behavior cannot be adequately predict ... Full text Cite

A performance comparison of optimally designed computer systems with and without virtual memory

Conference Proceedings - International Symposium on Computer Architecture · April 23, 1979 In this paper, a comparison of the performance of optimally designed computer systems with and without virtual memory is made. The computer systems in question are modeled by closed queuing networks of the central server type. The design of the systems is ... Full text Cite

A Decision Model for Closed Queuing Networks

Journal Article IEEE Transactions on Software Engineering · January 1, 1979 This paper considers a computer configuration design problem. The computer system is modeled by a closed queuing network. The system throughput is the objective function to be maximized and the speed of the devices are the decision variables. A rich class ... Full text Cite

Corrections to “On the Use of Continued Fractions for Digital Computer Arithmetic”

Journal Article IEEE Transactions on Computers · January 1, 1978 Full text Cite

MATHEMATICAL MODEL FOR COMPUTER SYSTEM CONFIGURATION PLANNING.

Journal Article Transactions of the American Association of Cost Engineers · January 1, 1978 This paper determines the speed of the devices constituting a computer system which will maximize system throughput given a fixed budget. The system composed of a central processor and peripheral devices is modeled as a closed queueing network of exponenti ... Cite

Analytic Modeling Computer Systems

Journal Article Computer · January 1, 1978 Full text Cite

DESIGN AND ANALYSIS OF A FUNCTIONALLY DISTRIBUTED COMPUTER SYSTEM.

Journal Article · January 1, 1978 A case study in the use of closed queuing network models for the design and analysis of an interactive distributed computer system as applicable to a military computer system as applicable to a military command and control system is presented. Such a proje ... Cite

HIGHER RADIX ON-LINE DIVISION.

Journal Article · January 1, 1978 A formal proof of correctness of the on-line division algorithm specified in an earlier paper is presented. Two radix 4 on-line division algorithms, with non-redundant and redundant operands respectively, are also derived. ... Cite

CORRECTION

Journal Article IEEE TRANSACTIONS ON COMPUTERS · 1978 Cite

Prepaging and applications to the STAR-100 computer

Journal Article High Speed Computer and Algorithm Organization · 1977 The Control Data Corporation (CDC) STAR (STring ARray) computer is a high performance vector machine capable of performing up to 100 million operations per second. Although the size of the main memory is limited to either 1/2 million or 1 million 64-bit wo ... Cite

On the Paging Performance of Array Algorithms

Journal Article IEEE Transactions on Computers · January 1, 1977 Data paging is of primary concern for problems with large data bases and for many types of array problems. We show that prepaging reduces the paging problems of array algorithms operating on large arrays. We also show that the use of a submatrix algorithm ... Full text Cite

On the Use of Continued Fractions for Digital Computer Arithmetic

Journal Article IEEE Transactions on Computers · January 1, 1977 Recently, there has been some interest in the use of continued fractions for digital hardware calculations. We require that the coefficients of the continued fractions be integral powers of 2 and, therefore, well-known continued fraction expansions of func ... Full text Cite

On-Line Algorithms for Division and Multiplication

Journal Article IEEE Transactions on Computers · January 1, 1977 In this paper, on-line algorithms for division and multiplication are developed. It is assumed that the operands as well as the result flow through the arithmetic unit in a digit-by-digit, most significant digit first fashion. The use of a redundant digit ... Full text Cite

ON THE PAGING PERFORMANCE OF ARRAY ALGORITHMS.

Journal Article IEEE Transactions on Computers · 1977 Data paging is of primary concern for problems with large data bases and for many types of array problems. It is shown that prepaging reduces the paging problems of array algorithms operating on large arrays. Also shown is that the use of a submatrix algor ... Cite

USE OF CONTINUED FRACTIONS FOR DIGITAL-COMPUTER ARITHMETIC

Journal Article IEEE TRANSACTIONS ON COMPUTERS · 1977 Cite

PAGING PERFORMANCE OF ARRAY ALGORITHMS

Journal Article IEEE TRANSACTIONS ON COMPUTERS · January 1, 1977 Link to item Cite

Prepaging and Applications to Array Algorithms

Journal Article IEEE Transactions on Computers · January 1, 1976 A demand prepaging algorithm DPMIN is defined and proved to be an optimal demand prepaging algorithm. However, it cannot be used in practice since it requires that the future reference string be completely known in advance. Several practical prepaging algo ... Full text Cite

On a semaphore anamoly

Journal Article Information Processing Letters · January 1, 1976 Full text Cite

On the use of continued fractions for digital computer arithmetic

Journal Article Proceedings - Symposium on Computer Arithmetic · January 1, 1975 Recently, there has been some interest in the use of continued fractions for digital hardware calculat ions. We require t h a t the coefficients of the continued fractions be integral powers of two. As a result well known continued fraction expansions of f ... Full text Cite

On-line algorithms for division and multiplication

Journal Article Proceedings - Symposium on Computer Arithmetic · January 1, 1975 Full text Cite

The Status of Investigations into Computer Hardware Design Based on the Use of Continued Fractions

Journal Article IEEE Transactions on Computers · January 1, 1973 The purpose of this paper is to demonstrate that represen tations of numbers other than positional notation may lead to practical hardware realizations for digital calculation of classes of algorithms. This paper describes current research in the use of co ... Full text Cite

The status of investigations into the use of continued fractions for computer hardware

Conference Proceedings - Symposium on Computer Arithmetic · January 1, 1972 The purpose of this paper is to demonstrate that representations of numbers other than positional notation may lead to practical hardware realizations for the digital calculation of classes of algorithms. It is the authors' opinion that practicality of the ... Full text Cite

Proactive fault-management in software systems

Conference Proceedings 33rd Annual Simulation Symposium (SS 2000) Full text Cite

An approach for combinatorial performance and availability analysis

Conference Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems Full text Cite

Dependency characterization in path-based approaches to architecture-based software reliability prediction

Conference Proceedings. 1998 IEEE Workshop on Application-Specific Software Engineering and Technology. ASSET-98 (Cat. No.98EX183) Full text Cite

Performance analysis of distributed real-time databases

Conference Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248) Full text Cite

An analytical approach to architecture-based software reliability prediction

Conference Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248) Full text Cite