Journal ArticleJournal of Systems and Software · August 1, 2024
In recent years, software bug prediction has shown to be effective in narrowing down the potential bug modules and boosting the efficiency and precision of existing testing and analysis tools. However, due to its non-deterministic nature and low presence, ...
Full textCite
Journal ArticleComputer · April 1, 2024
This article discusses model-driven methods with analytic-numeric solutions. In addition to traditional non-state-space and state-space methods, multilevel methods are explored using real case studies. Challenges met while developing and solving dependabil ...
Full textCite
ConferenceIEEE Transactions on Reliability · March 1, 2024
Given heavy dependence on man-made systems in our daily lives, reliability and availability of these systems clearly gain great importance. Together with methods of enhancing reliability and availability of systems, methods of quantitative assessment of th ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · March 1, 2024
Traditional software fault tolerance makes use of design-diversity-based redundancy. While proven to be effective, the independent development of multiple versions of a program or component is connected with high costs. This article shows that failures cau ...
Full textCite
Chapter · January 1, 2024
Our daily lives are dependent on various technological systems that may be mission-critical, safety-critical, or business-critical. Reliability and availability are crucial attributes and, thus, key requirements that should be considered during the entire ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · October 1, 2023
This paper aims to use analytical modeling technique to quantitatively study the dependability of Vehicle Platooning Application, which consists of Multiple Sub-Services (VPP-MSS) to achieve its functionality. Each sub-service (SS), based on network functi ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · July 1, 2023
The Multi-access Edge Computing (MEC) and Network Function Virtualization (NFV) integrated architecture is a key enabling platform for 5G to run multiple customized services in the form of service function chain (SFC) configured as an ordered set of servic ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · July 1, 2023
Multi-access edge computing (MEC)-enabled Internet of Things (IoT) is considered as a promising paradigm to deliver computation-intensive and delay-sensitive services to users. IoT service requests can be served by multiple microservices (MSs) that form a ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · July 1, 2023
As software plays an increasingly important role in our lives, it is essential to maintain its reliability, and generally dependability. Software bugs can cause huge financial losses and dangerous accidents; the safety risks from software are underscored t ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · April 1, 2023
Unmanned aerial vehicle (UAV) and network function virtualization (NFV) facilitate the deployment of multi-access edge computing (MEC). In the UAV-based MEC (UMEC) network, virtualized network function (VNF) can be implemented as a lightweight container ru ...
Full textCite
Chapter · January 1, 2023
This chapter introduces the moment-based epistemic uncertainty propagation in Markov models. The epistemic uncertainty in Markov models introduces the uncertainty of model parameters, and it can be propagated by regarding parameters as random variables. Th ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2022
Understanding and predicting types of bugs are of practical importance for developers to improve the testing efficiency and take appropriate steps to address bugs in software releases. However, due to the complex conditions under which faults manifest and ...
Full textCite
Journal ArticlePeer-to-Peer Networking and Applications · July 1, 2022
Network function virtualization (NFV) has been explored to be integrated with multi-access edge computing (MEC) to facilitate the development of 5G (fifth-generation) network. Latency-sensitive applications can be deployed as serial-parallel hybrid service ...
Full textCite
Journal ArticleComputer · May 1, 2022
Software can show symptoms of two different types of aging. Sometimes, it is even subject to both types. ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · January 1, 2022
Migration-based Dynamic Platform (MDP) technique, a type of Moving Target Defense (MTD) techniques, defends against sophisticated cyber-attacks by randomly and dynamically selecting a platform for executing service/job. Security defense mechanisms protect ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · January 1, 2022
With the rapid and wide development and deployment of system virtualization, service availability analysis has become increasingly important in a virtualized system (VS) which suffers from software aging. Software rejuvenation techniques can be applied to ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2021
Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · September 1, 2021
The safety-critical applications of vehicular ad hoc networks (VANETs) require high reliability and low transmission latency. IEEE 802.11p and IEEE 802.11bd are two standards proposed for such vehicular communication systems. In this paper, we propose an e ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · June 1, 2021
Vehicle platooning can be applied to cooperative downloading and uploading (CDU) services through the cooperation between lead vehicle and non-lead vehicles. CDU service can be completed cooperatively by containers constructed in vehicles of the vehicle pl ...
Full textCite
Journal ArticleJournal of Systems and Software · August 1, 2024
In recent years, software bug prediction has shown to be effective in narrowing down the potential bug modules and boosting the efficiency and precision of existing testing and analysis tools. However, due to its non-deterministic nature and low presence, ...
Full textCite
Journal ArticleComputer · April 1, 2024
This article discusses model-driven methods with analytic-numeric solutions. In addition to traditional non-state-space and state-space methods, multilevel methods are explored using real case studies. Challenges met while developing and solving dependabil ...
Full textCite
ConferenceIEEE Transactions on Reliability · March 1, 2024
Given heavy dependence on man-made systems in our daily lives, reliability and availability of these systems clearly gain great importance. Together with methods of enhancing reliability and availability of systems, methods of quantitative assessment of th ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · March 1, 2024
Traditional software fault tolerance makes use of design-diversity-based redundancy. While proven to be effective, the independent development of multiple versions of a program or component is connected with high costs. This article shows that failures cau ...
Full textCite
Chapter · January 1, 2024
Our daily lives are dependent on various technological systems that may be mission-critical, safety-critical, or business-critical. Reliability and availability are crucial attributes and, thus, key requirements that should be considered during the entire ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · October 1, 2023
This paper aims to use analytical modeling technique to quantitatively study the dependability of Vehicle Platooning Application, which consists of Multiple Sub-Services (VPP-MSS) to achieve its functionality. Each sub-service (SS), based on network functi ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · July 1, 2023
The Multi-access Edge Computing (MEC) and Network Function Virtualization (NFV) integrated architecture is a key enabling platform for 5G to run multiple customized services in the form of service function chain (SFC) configured as an ordered set of servic ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · July 1, 2023
Multi-access edge computing (MEC)-enabled Internet of Things (IoT) is considered as a promising paradigm to deliver computation-intensive and delay-sensitive services to users. IoT service requests can be served by multiple microservices (MSs) that form a ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · July 1, 2023
As software plays an increasingly important role in our lives, it is essential to maintain its reliability, and generally dependability. Software bugs can cause huge financial losses and dangerous accidents; the safety risks from software are underscored t ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · April 1, 2023
Unmanned aerial vehicle (UAV) and network function virtualization (NFV) facilitate the deployment of multi-access edge computing (MEC). In the UAV-based MEC (UMEC) network, virtualized network function (VNF) can be implemented as a lightweight container ru ...
Full textCite
Chapter · January 1, 2023
This chapter introduces the moment-based epistemic uncertainty propagation in Markov models. The epistemic uncertainty in Markov models introduces the uncertainty of model parameters, and it can be propagated by regarding parameters as random variables. Th ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2022
Understanding and predicting types of bugs are of practical importance for developers to improve the testing efficiency and take appropriate steps to address bugs in software releases. However, due to the complex conditions under which faults manifest and ...
Full textCite
Journal ArticlePeer-to-Peer Networking and Applications · July 1, 2022
Network function virtualization (NFV) has been explored to be integrated with multi-access edge computing (MEC) to facilitate the development of 5G (fifth-generation) network. Latency-sensitive applications can be deployed as serial-parallel hybrid service ...
Full textCite
Journal ArticleComputer · May 1, 2022
Software can show symptoms of two different types of aging. Sometimes, it is even subject to both types. ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · January 1, 2022
Migration-based Dynamic Platform (MDP) technique, a type of Moving Target Defense (MTD) techniques, defends against sophisticated cyber-attacks by randomly and dynamically selecting a platform for executing service/job. Security defense mechanisms protect ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · January 1, 2022
With the rapid and wide development and deployment of system virtualization, service availability analysis has become increasingly important in a virtualized system (VS) which suffers from software aging. Software rejuvenation techniques can be applied to ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2021
Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · September 1, 2021
The safety-critical applications of vehicular ad hoc networks (VANETs) require high reliability and low transmission latency. IEEE 802.11p and IEEE 802.11bd are two standards proposed for such vehicular communication systems. In this paper, we propose an e ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · June 1, 2021
Vehicle platooning can be applied to cooperative downloading and uploading (CDU) services through the cooperation between lead vehicle and non-lead vehicles. CDU service can be completed cooperatively by containers constructed in vehicles of the vehicle pl ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · June 1, 2021
The recent trend of network softwarization suggests a radical shift in the implementation of traditional network intelligence. In Software Defined Networking (SDN), for instance, the control plane functions of forwarding devices are outsourced to the contr ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · June 1, 2021
Intrusion tolerance is an ability to keep the correct service by masking the intrusion based on fault-tolerant techniques. With the rapid development of virtualization, the virtual machine (VM)-based intrusion tolerance scheme has been developed according ...
Full textCite
ConferenceProceedings - 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume, DSN-S 2021 · June 1, 2021
ADA is a popular programming language that was named after Lady Ada Lovelace (1815-1852) and recommended by Department of Defense, USA, for development of large scale safety-critical software systems. In this Fast Abstract, ADA is reinterpreted as Autonomo ...
Full textCite
Journal ArticleIEEE Transactions on Network Science and Engineering · January 1, 2021
As multi-hop wireless networks are attracting more attention, the need to evaluate their performance becomes essential. In order to evaluate the performance metrics of multi-hop wireless networks, including sending and receiving rates of a node as well as ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · January 1, 2021
A Multi-access Edge Computing (MEC) micro data center (MEDC) consists of multiple MEC hosts close to endpoint devices. MEC service is delivered by instantiating a virtualization system (e.g., Virtual Machines or Containers) on a MEC host. MEDC faces more n ...
Full textCite
ConferenceBrazilian Symposium on Computing System Engineering, SBESC · January 1, 2021
A fundamental aspect of software reliability engineering is to understand how software failures manifest, identifying and comprehending their causes and effects. In this paper, we perform ex-post analyses of field software failure data, looking to characte ...
Full textCite
ConferenceBrazilian Symposium on Computing System Engineering, SBESC · November 24, 2020
Empirical studies have shown robust evidence of OS failure patterns characterized by multiple combinations of failure events composed of the same or different failure types. In this paper, we present a statistical approach to predict OS failures based on m ...
Full textCite
Chapter · November 16, 2020
Reliability and availability are key attributes of technical systems. Methods of quantifying these attributes are thus essential during all phases of system lifecycle. Data (measurement)-driven methods are suitable for components or subsystems but, for the ...
Full textCite
ConferenceProceedings - 2020 IEEE 31st International Symposium on Software Reliability Engineering Workshops, ISSREW 2020 · October 1, 2020
This talk summarizes the genesis of software aging and rejuvenation as presented in the handbook of software aging and rejuvenation. It also lays out possible future directions to reflect the content of the concluding chapter of the handbook. ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · June 1, 2020
In Software Defined Networking (SDN), network programmability is enabled through a logically centralized control plane. Production networks deploy multiple controllers for scalability and reliability reasons, which in turn rely on distributed consensus pro ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · March 1, 2020
Software failures caused by data race bugs have always been major concerns in parallel and distributed systems, despite significant efforts spent in software testing. Due to their nondeterministic and hard-to-reproduce features, when evaluating systems' op ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2020
The Internet world is moving toward a scenario where users and applications have very diverse service expectation, making the current best-effort model inadequate and limiting. To be able to design high-availability service systems, it is essential to cons ...
Full textCite
Journal ArticleFuture Generation Computer Systems · January 1, 2020
The extent of epistemic uncertainty in modeling and analysis of complex systems is ever growing, mainly due to increasing levels of the openness, heterogeneity and versatility in cloud-based applications that are being adopted in critical sectors, like ban ...
Full textCite
Journal ArticleIEEE Access · January 1, 2020
Virtualization technology has promoted the fast development and deployment of cloud computing, and is now becoming an enabler of Internet of Everything. Virtual machine monitor (VMM), playing a critical role in a virtualized system, is software and hence i ...
Full textCite
Book · January 1, 2020
The Handbook of Software Aging and Rejuvenation provides a comprehensive overview of the subject, making it indispensable to graduate students as well as professionals in the field. It begins by introducing fundamental concepts, definitions, and the histor ...
Full textCite
Chapter · January 1, 2020
In this chapter we present a summary of some future directions for software aging and rejuvenation research. ...
Full textCite
Chapter · January 1, 2020
Software aging and rejuvenation originated at AT&T Bell Labs. A significant amount of research on the topic also took place at Duke University and Università degli Studi di Napoli Federico II. We present here a historical perspective on this topic as viewe ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2019
This paper presents an empirical study of 5741 bug reports for the Linux kernel from an evolutionary perspective, with the aim of obtaining a deep understanding of bug characteristics in the Linux operating system. Bug classification is performed based on ...
Full textCite
ConferenceBrazilian Symposium on Computing System Engineering, SBESC · November 1, 2019
A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this paper, we present an exploratory study on multiple-even ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · October 1, 2019
Infrastructure as a Service (IaaS) is one of the most significant and fastest growing fields in cloud computing. To efficiently use the resources of an IaaS cloud, several important factors such as performance, availability, and power consumption need to b ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · October 1, 2019
Software aging, which is caused by Aging-Related Bugs (ARBs), tends to occur in long-running systems and may lead to performance degradation and increasing failure rate during software execution. ARB prediction can help developers discover and remove ARBs, ...
Full textCite
ConferenceProceedings - 2019 IEEE 30th International Symposium on Software Reliability Engineering Workshops, ISSREW 2019 · October 1, 2019
Two decades after the seminal paper on software aging and rejuvenation appeared in 1995, a new concept and metric referred to as the age of information (AoI) has been gaining attention from practitioners and the research community. In this vision paper, ou ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2019
In long running systems, software tends to encounter performance degradation and increasing failure rate during execution. This phenomenon has been named software aging, which is caused by aging-related bugs (ARBs). Testing resource allocation can be optim ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · June 1, 2019
The Android operating system (OS) is a sophisticated man-made system and is the dominant OS in the current smartphone market. Due to the accumulation of errors in the system internal state and the incremental consumption of resources, such as the Dalvik he ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · June 1, 2019
In some applicable scenarios, such as community patrolling, mobile nodes are restricted to move only in their own communities. Exploiting the meetings of the nodes within the same community and the nodes within the neighboring communities, a delay tolerant ...
Full textCite
Journal ArticleReliability Engineering and System Safety · March 1, 2019
Malicious lateral movement-based attacks have become a potential risk for many systems, bringing highly likely threats to critical infrastructures and national security. When launching this kind of attacks, adversaries first compromise a fraction of the ta ...
Full textCite
Chapter · January 1, 2019
Modern systems implement multiple and complex operations to manage the user demand, thereby ensuring adequate quality levels. They are usually made of a collection of interconnected (autonomous) subsystems, with a common goal to be pursued, that are percei ...
Full textCite
Journal ArticleReliability Engineering and System Safety · December 1, 2018
Software aging often affects the performance of software systems and may eventually cause them to fail. A complementary approach to handle transient software failures due to the software aging is called software rejuvenation. It is a preventive and proacti ...
Full textCite
ConferenceNCA 2018 - 2018 IEEE 17th International Symposium on Network Computing and Applications · November 26, 2018
Hyperledger Fabric (HLF) is an open-source implementation of a distributed ledger platform for running smart contracts in a modular architecture. In this paper, we present a performance model of Hyperledger Fabric v1.0+ using Stochastic Reward Nets (SRN). ...
Full textCite
ConferenceProceedings - International Conference on Computer Communications and Networks, ICCCN · October 9, 2018
This paper aims to analyze transient security and dependability of a vulnerable critical system, under vulnerability-related attack and two reactive defense strategies, from a severe vulnerability announcement until the vulnerability is fully removed from ...
Full textCite
Journal ArticlePerformance Evaluation · October 1, 2018
Due to the increasing need for computational power, the market has shifted towards big centralized data centers. Understanding the nature of the dynamics of these data centers from machine and job/task perspective is critical to design efficient data cente ...
Full textCite
Journal ArticleComputer Journal · October 1, 2018
In this paper, the performance of a grid resource is modeled and evaluated using stochastic reward nets (SRNs), wherein the failure–repair behavior of its processors is taken into account. The proposed SRN is used to compute the blocking probability and se ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · October 1, 2018
Heterogeneity prevails not only among physical machines but also among workloads in real IaaS Cloud data centers (CDCs). The heterogeneity makes performance modeling of large and complex IaaS CDCs even more challenging. This paper considers the scenario wh ...
Full textCite
Journal ArticleIEEE Transactions on Network and Service Management · September 1, 2018
In software defined networking (SDN), critical control plane functions are offloaded to a software entity known as the SDN controller. Today's SDN controllers are complex software systems, owing to heterogeneity of networks and forwarding devices they supp ...
Full textCite
ConferenceProceedings - 8th Latin-American Symposium on Dependable Computing, LADC 2018 · July 2, 2018
The uncertainty propagation is to investigate the effect of errors in model input parameters on the system output measure in probability models. In this paper, we present a moment-based approach of the uncertainty propagation of model input parameters. The ...
Full textCite
Journal ArticleAnnual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference · July 2018
Outpatient centers comprised of many concurrent clinics increasingly see higher patient volumes. In these centers, decisions to improve clinic flow must account for the high degree of interdependence when critical personnel or equipment is shared between c ...
Full textCite
Journal ArticleFuture Generation Computer Systems · June 1, 2018
The increasing shift of various critical services towards Infrastructure-as-a-Service (IaaS) cloud data centers (CDCs) creates a need for analyzing CDCs’ availability, which is affected by various factors including repair policy and system parameters. This ...
Full textCite
Journal ArticleElectronic Notes in Theoretical Computer Science · May 9, 2018
Data-centers have recently experienced a fast growth in energy demand, mainly due to cloud computing, a paradigm that lets the users access shared computing resources (e.g., servers, storage, etc.). Several techniques have been proposed in order to allevia ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2018
This paper presents a performability model for RAID storage systems using Markov regenerative process to compare different RAID architectures. While homogeneous Markov models are extensively used for reliability analysis of RAID storage systems, the memory ...
Full textCite
Journal ArticleInformation Sciences · January 1, 2018
Transient performance analysis of power distribution network (PDN) after a failure occurrence could facilitate the better design of smart grid. Researchers have proposed analytical models and the numerical solutions to analyze the PDN's transient behaviors ...
Full textCite
ConferenceProceedings - 2017 IEEE 28th International Symposium on Software Reliability Engineering Workshops, ISSREW 2017 · November 14, 2017
As enterprises continue to move their workloads from traditional server-room environments to private cloud-based systems, there is an increasing desire and ability for companies like IBM to centrally monitor the systems on behalf of their customers to proa ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · November 14, 2017
Datarace is a common problem on shared-memory parallel computers, including multicores. Due to its dependence on the thread scheduling scheme of its execution environment, the time to a datarace failure is usually very long. How to accelerate the occurrenc ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · November 14, 2017
Linux operating system is a complex system that is prone to suffer failures during usage, and increases difficulties of fixing bugs. Different testing strategies and fault mitigation methods can be developed and applied based on different types of bugs, wh ...
Full textCite
ConferenceProceedings of the IEEE Symposium on Reliable Distributed Systems · October 13, 2017
While Blockchain network brings tremendous benefits, there are concerns whether their performance would match up with the mainstream IT systems. This paper aims to investigate whether the consensus process using Practical Byzantine Fault Tolerance (PBFT) c ...
Full textCite
Conference2017 2nd International Conference on System Reliability and Safety, ICSRS 2017 · July 2, 2017
Epistemic uncertainty analysis accounts for inaccurate input parameters and evaluates how such uncertainty propagates to output measures. In this work we will focus on Weibull distributions, in particular the one related to the reliability of multi-core sy ...
Full textCite
Conference2017 13th International Conference on Network and Service Management, CNSM 2017 · July 1, 2017
Software Defined Networking (SDN) exposes critical networking decisions, such as traffic routing or enforcement of the critical security policies, to a software entity known as the SDN controller. Controller software, as written by humans, is intrinsically ...
Full textCite
ConferenceProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · May 5, 2017
The growing popularity and complexity of Android operating system makes it prone to suffer failures during usage, which increases difficulties of fixing bugs. Different strategies and mitigation methods can be developed and applied based on different types ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · March 29, 2017
Software vulnerability analysis plays a critical role in the prevention and mitigation of software security attacks, and vulnerability classification constitutes a key part of this analysis. This paper proposes a new approach for software vulnerability cla ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · March 29, 2017
High volume outpatient clinics such as eye care centers cannot afford excessive delays, especially when due to limited resources, time, or overhead. Modeling tools from reliability & maintainability practice may provide the means to better assess where imp ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · March 29, 2017
Medical imaging systems from major modalities such as Magnetic Resonance Imaging or X-Ray Computed Tomography are complex devices subject to various types of maintenance. Medical device companies that develop these systems often monitor and maintain system ...
Full textCite
ConferenceProceedings - 2016 16th IEEE International Conference on Computer and Information Technology, CIT 2016, 2016 6th International Symposium on Cloud and Service Computing, IEEE SC2 2016 and 2016 International Symposium on Security and Privacy in Social Networks and Big Data, SocialSec 2016 · March 10, 2017
Availability is one of the key requirements for modern networked system. Availability of a virtualized system can be modelled and analyzed using stochastic models. In our previous work, availability of a virtualized system was modeled using a hierarchical ...
Full textCite
Conference2017 International Conference on Computing, Networking and Communications, ICNC 2017 · March 10, 2017
In this paper, we focus on the design and analysis of channel access in vehicular ad hoc networks (VANETs) for event-driven multi-hop safety services. First, a novel channel access scheme that incorporates an application-level distance (timer)-based rebroa ...
Full textCite
Journal ArticleJournal of Grid Computing · March 1, 2017
Cloud computing infrastructures are designed to be accessible anywhere and anytime. This requires various fault tolerance mechanisms for coping with software and hardware failures. Hierarchical modeling approaches are often used to evaluate the availabilit ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · March 1, 2017
In a traffic jam or dense vehicle environment, vehicular ad hoc networks (VANETs) cannot meet the safety requirement due to serious packet collisions. The traditional cellular network solves packet collisions but suffers from long end-to-end delay. Third-G ...
Full textCite
ConferencePerformance Evaluation Review · March 1, 2017
We quantify the resiliency of large scale systems upon changes encountered beyond the normal system behavior. Formal definitions for resiliency and change are provided together with general steps for resiliency quantification and a set of resiliency metric ...
Full textCite
ConferenceValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017
In this paper, we present a computationally efficient technique for calculating the mean time to security failure (MTTSF) of a mobile cyber physical system (CPS). The CPS analyzed here has been comprehensively studied by other authors using stochastic rewa ...
Full textCite
ConferenceValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017
Input parameters of dependability models are often not known accurately. Two principal methods of dealing with such parametric uncertainty are: sensitivity analysis and uncertainty propagation. This paper is an initial attempt to link the two approaches. T ...
Full textCite
ConferenceValueTools 2016 - 10th EAI International Conference on Performance Evaluation Methodologies and Tools · January 1, 2017
We quantify the resiliency of large scale systems upon changes encountered beyond the normal system behavior. General steps for resiliency quantification are shown and resiliency metrics are defined to quantify the effects of changes. The proposed approach ...
Full textCite
ConferenceProceedings - Conference on Local Computer Networks, LCN · December 22, 2016
Transient survivability analysis of a virtualized system (VS) is critical to the wide deployment of cloud services. The existing research of VS availability and/or reliability focused on the steady-state analysis. This paper presents a model and the closed ...
Full textCite
ConferenceProceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering Workshops, ISSREW 2016 · December 16, 2016
Previous studies have defined different types of software bugs based on their complexity and reproducibility. Simple bugs, which involve only direct factors and are often easy to reproduce, have been called 'Bohrbugs', while complex bugs, with at least one ...
Full textCite
ConferenceProceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering Workshops, ISSREW 2016 · December 16, 2016
In this study we evaluate the applicability of the differential software analysis approach to detect memory leaks under a real workload. For this purpose, we used three different versions of a widely used software application, where one version was used as ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2016
Software rejuvenation is a proactive software control technique that is used to improve a computing system performance when it suffers from software aging. In this paper, a two-granularity inspection-based software rejuvenation policy, which works as a clo ...
Full textOpen AccessCite
Journal ArticleReliability Engineering and System Safety · November 1, 2016
The reliability of power grids has been subject of study for the past few decades. Traditionally, detailed models are used to assess how the system behaves after failures. Such models, based on power flow analysis and detailed simulations, yield accurate c ...
Full textCite
ConferenceProceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016 · September 22, 2016Full textCite
Journal ArticleReliability Engineering and System Safety · September 1, 2016
Vehicular ad hoc network (VANET) is a technology that facilitates communication between vehicles by creating a 'mobile Internet'. The system aims at ensuring road safety and achieving secured commutation. For this reason, reliability and survivability of t ...
Full textCite
Book · September 1, 2016
This updated and revised edition of the popular classic relates fundamental concepts in probability and statistics to the computer sciences and engineering. The author uses Markov chains and other statistical tools to illustrate processes in reliability of ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · April 5, 2016
Operations of critical care departments in health systems are increasingly reliant on the availability of interoperable medical devices. Many large health care systems have fully transitioned in recent years to uniform electronic health record platforms, i ...
Full textCite
ConferenceProceedings - Annual Reliability and Maintainability Symposium · April 5, 2016
With the rise in quantifiable approaches to health care, lessons from reliability modeling provide new avenues for improving patient outcomes. Describing the development of conditions leading to organ system failure provides visceral motivation for quantif ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · March 1, 2016
Software failures are still a major concern in mission- and enterprise-critical contexts, despite significant efforts spent in software testing. In fact, while software testing is effective against easily-reproducible bugs (Bohrbugs), it is considerably le ...
Full textCite
ConferenceProceedings - 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing, PRDC 2015 · January 4, 2016
Explosive growth of data generation and increasing reliance of business analysis on massive data make data loss more damaging than ever before. Thus it has also become a critical issue for businesses to protect important data effectively. In a system with ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016
Computer systems are potentially targeted by cybercriminals by means of specially crafted malicious software called Advanced Persistent Threats (APTs). As a consequence, any security attribute of the computer system may be compromised: disruption of servic ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016
Survivability is a critical attribute of modern computer and communication systems. The assessment of survivability is mostly performed in a qualitative manner and thus cannot meet the need for more precise and solid evaluation of service loss or degradati ...
Cite
ConferenceIEEE International Conference on Software Quality, Reliability and Security : proceedings. IEEE International Conference on Software Quality, Reliability and Security · January 2016
In this paper, we present the software reliability analysis of the flight software of a recently launched space mission. For our analysis, we use the defect reports collected during the flight software development. We find that this software was developed ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2016
This paper presents a new analytic model for the performance and reliability of safety-related message broadcast in vehicular ad hoc networks (VANETs) at intersections with non-homogeneous Poisson process (NHPP) for more general road traffic and node distr ...
Full textCite
Journal ArticleTelecommunication Systems · December 1, 2015
Featured Publication
Survivability is a concept that describes the capability of a system to achieve timely recovery after the occurrence of undesired events. It is more general and detailed than many terms, such as RTO and RPO, that have a similar goal. Survivability is capab ...
Full textCite
ConferenceProceedings of the International Conference on Dependable Systems and Networks · September 14, 2015
This paper provides a summary of the First International Workshop on Model Based Design for Cyber- Physical Systems (MB4CP 2015) in conjunction with DSN 2015 conference in Rio de Janeiro, Brazil. ...
Full textCite
ConferenceProceedings - 1st International Workshop on Complex Faults and Failures in Large Software Systems, COUFLESS 2015 · August 5, 2015
The interaction of software with its execution environment is an underestimated cause of complex faults activation and systems failure. This paper discusses a possible framework to emulate anomalous environment conditions in order to assess the impact of t ...
Full textCite
Conference2015 11th International Conference on the Design of Reliable Communication Networks, DRCN 2015 · July 2, 2015
Social infrastructure systems such as communication, transportation, power and water supply systems are now facing various types of threats including component failures, security attacks and natural disasters, etc. Whenever such undesirable events occur, i ...
Full textCite
Chapter · April 22, 2015
Modeling is a fundamental aspect of the design process of a complex system, as it allows the designer to compare different architectural choices as well as predict the behavior of the system under varying input traffic, service, fault and prevention parame ...
Full textCite
Journal ArticleTelecommunication Systems · March 27, 2015
Featured Publication
In this position paper on reliable networks, we discuss new trends in the design of reliable communication systems. We focus on a wide range of research directions including protection against software failures as well as failures of communication systems ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · March 1, 2015
In this paper, performance of grid computing environment is studied in the presence of failure-repair of the resources. To achieve this, in the first step, each of the grid resource is individually modeled using Stochastic Reward Nets (SRNs), and mean resp ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · January 1, 2015
Traditional system-oriented dependability metrics like reliability and availability do not fully reflect the impact of system failure-repair behavior in service-oriented environments. The telecommunication systems community prefers to use Defects Per Milli ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2015
Resiliency is often considered as a synonym for faulttolerance and reliability/availability. We start from a different definition of resiliency as the ability to deliver services when encountering unexpected changes. Semantics of change is of extreme impor ...
Full textCite
Chapter · January 1, 2015
In this paper, on-line algorithms for division and multiplication are developed. It is assumed that the operands as well as the result flow through the arithmetic unit in a digit-by-digit, most significant digit first fashion. The use of a redundant digit ...
Full textCite
ConferenceProceedings - IEEE 25th International Symposium on Software Reliability Engineering Workshops, ISSREW 2014 · December 12, 2014
Application servers (AS) of virtualized platform may suffer from software aging problem. In this paper, we first formulate the system model including three virtual machines. Two of them act as the main servers, and the third machine acts as the backup node ...
Full textCite
Journal ArticleProceedings - IEEE 25th International Symposium on Software Reliability Engineering Workshops, ISSREW 2014 · December 12, 2014
With its simple principles to achieve parallelism and fault tolerance, the Map-reduce framework has captured wide attention, from traditional high performance computing to marketing organizations. The most popular open source implementation of this framewo ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 11, 2014
We investigate the dependence of software failure reproducibility on the environment in which the software is executed. The existence of such dependence is ascertained in literature, but so far it is not fully characterized. In this paper we pinpoint some ...
Full textCite
ConferenceProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · December 3, 2014
Virtual machines (VM) are used in cloud computing systems to handle user requests for service. A typical user request goes through several cloud service provider specific processing steps from the instant it is submitted until the service is completed. In ...
Full textCite
Journal ArticleIEEE Transactions on Computers · December 1, 2014
IEEE 1609.4 protocol defines a channel switching mechanism to enable a single radio operating efficiently on multiple channels to support both safety and non-safety services. Basic safety message (BSM) is transmitted only through the control channel at reg ...
Full textCite
Journal ArticleEuropean Journal of Operational Research · November 1, 2014
In this paper, an algorithm for the fast computation of network reliability bounds is proposed. The evaluation of the network reliability is an intractable problem for very large networks, and hence approximate solutions based on reliability bounds have as ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 18, 2014
The explosive growth of data generation and increasing reliance of business analysis on massive data make data loss more damaging than ever before. Nowadays many organizations start relying on cloud services for keeping their valuable data. It is a critica ...
Full textCite
Journal ArticleInternational Journal of Modern Physics B · September 10, 2014
The paper regards the complex dynamical network (CDN) as a static network with temporal characteristics so as to consider its dynamic behavior. The influence factor and dynamics laws in CDN are explored by using the methods of simulation and statistical ph ...
Full textCite
Conference2014 IEEE Globecom Workshops, GC Wkshps 2014 · March 18, 2014
Defects Per Million (DPM), defined as the number of calls dropped out of a million calls due to failures, is used by the telecommunication systems community as a user-perceived dependability metric. As new standards evolve, with built-in mechanisms to hand ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2014
In modern IT systems, data backup and restore operations are essential for providing protection against data loss from both natural and man-made incidents. On the other hand, data backup and restore operations can be resource-intensive and lead to performa ...
Full textCite
Journal ArticleIEEE Transactions on Services Computing · January 1, 2014
From an enterprise perspective, one key motivation to transform the traditional IT management into Cloud is the cost reduction of the hosted services. In an Infrastructure-as-a-Service (IaaS) Cloud, virtual machine (VM) instances share the physical machine ...
Full textCite
Journal ArticleIEEE Transactions on Cloud Computing · January 1, 2014
In a large Infrastructure-as-a-Service (IaaS) cloud, component failures are quite common. Such failures may lead to occasional system downtime and eventual violation of Service Level Agreements (SLAs) on the cloud service availability. The availability ana ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2014
Software systems running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a c ...
Full textCite
Chapter · January 1, 2014
IaaS clouds are major enablers of data-intensive cloud applications because they provide necessary computing capacity for managing Big Data environments. In a typical IaaS cloud, virtual machine (VM) instances deployed on physical machines (PM) are provide ...
Full textCite
Book · December 6, 2012
In structuring the book, the authors have been careful to provide the reader with a methodological approach to analytical modeling techniques. ...
Cite
Chapter · December 1, 2012
This chapter presents multi-state availability modeling in practice. We use three analytic modeling techniques; (1) continuous time Markov chains, (2) stochastic reward nets, and (3) multi-state fault trees. Two case studies are presented to show the usage ...
Full textCite
Journal ArticleEurasip Journal on Wireless Communications and Networking · December 1, 2012
In the current worldwide ICT scenario, a constantly growing number of ever more powerful devices (smartphones, sensors, household appliances, RFID devices, etc.) join the Internet, significantly impacting the global traffic volume (data sharing, voice, mul ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2012
The growing complexity of mission-critical space mission software makes it prone to suffer failures during operations. The success of space missions depends on the ability of the systems to deal with software failures, or to avoid them in the first place. ...
Full textCite
Journal ArticleACM SIGMETRICS Performance Evaluation Review · March 9, 2012
Reliability and performance evaluation are important, often mandatory, steps in designing and analyzing (critical) systems. In such cases, accurate models are required to adequately take into account interference or dependent behaviors affecting th ...
Full textCite
Journal ArticleIEEE Vehicular Technology Conference · December 23, 2011
IEEE- and ASTM-adopted Dedicated Short Range Communications (DSRC) vehicle safety-related communication services, which require reliable and fast message delivery, usually demand broadcast communications in vehicular ad hoc networks (VANETs). In this paper ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · December 14, 2011
High-availability assurance of cloud service is a critical and challenging issue for cloud service providers. To quantify the availability of cloud services from both architectural and operational points of views, availability modeling and evaluation are e ...
Full textCite
Journal ArticleCTRQ 2011 - 4th International Conference on Communication Theory, Reliability, and Quality of Service · December 1, 2011
In this paper, an analytic model is proposed for the performance evaluation of vehicular safety related services in the dedicated short range communications (DSRC) system on highways. The generation and service of safety messages in each vehicle is modeled ...
Cite
Journal ArticleCTRQ 2011 - 4th International Conference on Communication Theory, Reliability, and Quality of Service · December 1, 2011
In this paper, we investigate the availability modeling of computer networks with redundancy mechanisms. Sensitivity analysis is applied in order to find the bottlenecks of system availability. We use Markov chains for the analytical evaluation of complex ...
Cite
Journal ArticleProceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011
A number of studies have reported the phenomenon of "software aging", characterized by progressive software performance degradation. Response time (RT) as a customer-affecting metric can be used to detect the onset of software aging. Alberto Avritzer et al ...
Full textCite
Journal ArticleProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · December 1, 2011
Several studies have been carried out on software bugs analysis and classification for life and mission critical systems, which include reproducible bugs called Bohrbugs, and hard to reproduce bugs called Mandelbugs. Although software reliability in IT sys ...
Full textCite
Journal ArticleProceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011
The need for reliability and availability has increased in modern applications, in order to handle rapidly growing demands while providing uninterrupted service. Cloud computing systems fundamentally provide access to large pools of data and computational ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2011
A number of studies have reported the phenomenon of "Software aging", caused by resource exhaustion and characterized by progressive software performance degradation. We develop experiments that simulate an on-line bookstore application, following the stan ...
Full textCite
Journal ArticleProceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011
In this paper we present an experimental comparative study of most of the rejuvenation techniques developed so far, divided into two groups: i) simple approaches: physical node reboot (switch off/on), VM reboot, OS reboot and standalone application restart ...
Full textCite
Journal ArticleProceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011
In this paper, a multi-granularity software rejuvenation policy isstudied. Four granularities of rejuvenation are proposed tomitigate the impact of four levels of software aging respectively.Continuous Time Markov Chain (CTMC) model is used to obtain theav ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2011
Stochastic models are often employed to study dependability of critical systems and assess various hardware and software fault-tolerance techniques. These models take into account the randomness in the events of interest (aleatory uncertainty) and are gene ...
Full textCite
Journal ArticleProceedings - 2011 3rd International Workshop on Software Aging and Rejuvenation, WoSAR 2011 · December 1, 2011
Virtual machine monitor (VMM) rejuvenation is a proactive recovery method against failures caused by software aging in VMM. Since the job running on a hosted virtual machine (VM) is interrupted at VMM rejuvenation, the preemption type of VMM rejuvenation i ...
Full textCite
Journal ArticleProceedings of the 2011 6th International Conference on Availability, Reliability and Security, ARES 2011 · November 9, 2011Full textCite
Journal ArticleProceedings of the 2011 6th International Conference on Availability, Reliability and Security, ARES 2011 · November 9, 2011
High-availability assurance of server systems is becoming an important issue, since many mission-critical applications are implemented on server systems. To achieve high-availability, software rejuvenation is a practical technique to reduce unexpected down ...
Full textCite
Journal ArticlePerformance Evaluation · October 1, 2011
This paper proposes an improved computation method of maximum likelihood (ML) estimation for phase-type (PH) distributions with a number of phases. We focus on the EM (expectation-maximization) algorithm proposed by Asmussen et al. [27] and refine it in te ...
Full textCite
Journal ArticleProceedings of the 12th IFIP/IEEE International Symposium on Integrated Network Management, IM 2011 · September 19, 2011
As online service providers utilize cloud computing to host their services, they are challenged by evaluating the quality of experience and designing redirection strategies in this complicated environment. We propose a hierarchical modeling approach that c ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 2, 2011
Optimizing for performance is often associated with higher costs in terms of capacity, faster infrastructure, and power costs. In this paper, we quantify the power-performance trade-offs by developing a scalable analytic model for joint analysis of perform ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 2, 2011
Over the last decade, research on dependable computing has undergone a shift from reactive towards proactive methods: In classical fault tolerance a system reacts to errors or component failures in order to prevent them from turning into system failures, a ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · August 26, 2011
Over the last decade, research on dependable computing has undergone a shift from reactive towards proactive methods: In classical fault tolerance a system reacts to errors or component failures in order to prevent them from turning into system failures, a ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · August 26, 2011
High availability is one of the key characteristics of Infrastructure-as-a- Service (IaaS) cloud. In this paper, we show a scalable method for availability analysis of large scale IaaS cloud using analytic models. To reduce the complexity of analysis and t ...
Full textCite
Journal Article2011 Prognostics and System Health Management Conference, PHM-Shenzhen 2011 · August 3, 2011
A review is carried out on how quantitative approaches have been applied so far to the Reliability Prediction and Assessment (RPA) for computer and communication systems. A series of the reliability evaluation technology based on analytic models and comput ...
Full textCite
Journal ArticleComputer Communications · August 2, 2011
IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ...
Full textCite
Chapter · June 30, 2011
Accelerated life test (ALT) methods are successfully applied in many industries to reduce the test period of highly dependable products. Software industry is not different, having the same demand to reduce the period of test for software products with very ...
Full textCite
Journal ArticleProceedings of International Conference on Software Engineering: Software Quality: The Road Ahead, CONSEG 2011 · January 1, 2011
In this paper, we describe a Markov chain based approach for the performance and availability analysis of cloud provided services. We use infrastructure-asa-service as an example of a cloud based service, where service availability and provisioning respons ...
Cite
Chapter · January 1, 2011
This chapter addresses the issue of determining the response time distribution in networks of queues. Four different techniques are described and demonstrated. A two step numerical approach to compute the response time distribution for closed Markovian net ...
Full textCite
Journal ArticleInternational Journal of Performability Engineering · January 1, 2011
Reliability is one of the key attributes of dependability and quality of service. Techniques and tools for reliability assessment are therefore required in order to evaluate and to predict system behavior. In many contexts, merely taking into account of st ...
Cite
Journal ArticleInternational Journal of Performability Engineering · January 1, 2011
Survivability is the capability of a system to fulfill its mission in a timely manner in the presence of failures, attacks and accidents. In this paper, quantitative assessment of survivability of cellular networks is conducted by developing an analytical ...
Cite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · December 30, 2010
In this paper, we discuss a Monte Carlo sampling based method for propagating the epistemic uncertainty in model parameters, through the system availability model. We also outline methods to compute the number of samples needed to obtain a desired confiden ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · December 30, 2010
Cloud based services may experience changes - internal, external, large, small - at any time. Predicting and quantifying the effects on the quality-of-service during and after a change are important in the resiliency assessment of a cloud based service. In ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 15, 2010
Restarts or retries are typical control schemes to meet a deadline in real-time systems, and are regarded as significant environmental diversity techniques in dependable computing. This paper reconsiders a restart control studied by van Moorsel and Wolter ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2010
Defects per million (DPM), defined as the number of calls out of a million dropped due to failures, is an important service (un)reliability measure for telecommunication systems. Most previous research derives the DPM from steady-state system availability ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2010
Software aging is a phenomenon defined as the continuing degradation of software systems during runtime, being particularly noticeable in long-running applications. Aging-related failures are very difficult to observe, because the accumulation of aging eff ...
Full textCite
Journal ArticleProceedings - 16th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2010 · December 1, 2010
Prior to field deployment, mission critical sensor networks should be analyzed for high reliability assurance. Past research only focused on reliability models for sensor node or network in isolation. This paper presents a comprehensive approach for reliab ...
Full textCite
Journal ArticleProceedings - 16th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2010 · December 1, 2010
Handling diverse client demands and managing unexpected failures without degrading performance are two key promises of a cloud delivered service. However, evaluation of a cloud service quality becomes difficult as the scale and complexity of a cloud system ...
Full textCite
Journal ArticleACM International Conference Proceeding Series · November 22, 2010
Attack tree (AT) is one of the widely used combinatorial models in cyber security analysis. The basic formalism of AT does not take into account defense mechanisms. Defense trees (DT) have been developed to investigate the effect of defense mechanisms usin ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 20, 2010
As space mission software becomes more complex, the ability to effectively deal with faults is increasingly important. The strategies that can be employed for fighting a software bug depend on its fault type. Bohrbugs are easily isolated and removed during ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 20, 2010
Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop provided a stimulating, and ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · September 20, 2010
Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop provided a stimulating, and ...
Full textCite
Journal ArticleEDCC-8 - Proceedings of the 8th European Dependable Computing Conference · July 12, 2010
Reliability is one of the major concerns for software engineers. The increasing size of software systems and their inherent complexity - which is essentially related to the intricate interdependencies among many heterogeneous components - pose serious diff ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · March 29, 2010
With software systems increasingly being employed in critical contexts, assuring high reliability levels for large, complex systems can incur huge verification costs. Existing standards usually assign predefined risk levels to components in the design phas ...
Full textCite
Journal ArticleInternational Journal of System Assurance Engineering and Management · January 1, 2010
Companies are expected to keep their systems up and running and make data continuously available. Several recent studies have established that most system outages are due to software faults. In this paper, we discuss availability aspects of large software- ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2010
A number of studies have reported the phenomenon of "Software aging", characterized by progressive software performance degradation. This is mainly caused by the exhaustion of the combination of system resources. Traditionally, modeling and analysis of sof ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2010
As server virtualization is used as an essential software infrastructure of various software services such as cloud computing, availability management of server virtualized system is becoming more significant. Although time-based software rejuvenation is u ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 2010
A distinct characteristic of multistate systems (MSS) is that the systems and/or their components may exhibit multiple performance levels (or states) varying from perfect operation to complete failure. MSS can model behaviors such as shared loads, performa ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 2010
In the past ten years, the software aging phenomenon has been systematically researched, and recognized by both academic, and industry communities as an important obstacle to achieving dependable software systems. One of its main effects is the depletion o ...
Full textCite
Journal ArticleProceedings of the 2009 7th International Workshop on the Design of Reliable Communication Networks, DRCN 2009 · December 16, 2009
There is a need to quantify system properties methodically. Dependability and security models have evolved nearly independently. Therefore, it is crucial to develop a classification of dependability and security models which can meet the requirement of pro ...
Full textCite
Journal Article2008 IEEE International Conference on Software Reliability Engineering Workshops, ISSRE Wksp 2008 · December 15, 2009
Virtualization enables data centers to consolidate servers to improve resource utilization and power consumption. This paper presents the issues of performability management in a virtualized data center that hosts multiple services using virtualization. On ...
Full textCite
Journal Article2008 IEEE International Conference on Software Reliability Engineering Workshops, ISSRE Wksp 2008 · December 15, 2009
Since the notion of software aging was introduced thirteen years ago, the interest in this phenomenon has been increasing from both academia and industry. The majority of the research efforts in studying software aging have focused on understanding its eff ...
Full textCite
Journal Article2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2009 · December 1, 2009
This paper develops an availability model of a virtualized system. We construct non-virtualized and virtualized two hosts system models using a two-level hierarchical approach in which fault trees are used in the upper level and homogeneous continuous time ...
Full textCite
Journal ArticleInternational Symposium on Performance Evaluation of Computer and Telecommunication Systems 2009, SPECTS 2009, Part of the 2009 Summer Simulation Multiconference, SummerSim 2009 · December 1, 2009
IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ...
Cite
Journal ArticleProceedings - Winter Simulation Conference · December 1, 2009
Critical services in a telecommunication network should survive and be continuously provided even when undesirable events like sabotage, natural disasters, or network failures happen. The network survivability is quantified as defined by the ANSI T1A1.2 co ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · November 25, 2009
Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop aimed to provide a stimulat ...
Full textCite
Journal ArticleProceedings of the 2009 International Symposium on Performance Evaluation of Computer and Telecommunication Systems, SPECTS 2009 · November 12, 2009
IEEE 802.15.4 is a popular choice for MAC/PHY protocols in low power and low data rate wireless sensor networks. In this paper, we develop a stochastic model for the beaconless operation of IEEE 802.15.4 MAC protocol. Given the number of nodes competing fo ...
Cite
Journal ArticleProceedings - International Conference on Advanced Information Networking and Applications, AINA · October 5, 2009
OSPF is a popular interior gateway routing protocol. Commercial OSPF routers limit their processing load by using a hold time between successive routing table calculations as new link state advertisements (LSAs) arrive following a topology change. A large ...
Full textCite
Journal ArticleInternational Journal of Reliability, Quality and Safety Engineering · August 1, 2009
Service Reliability is an important consideration for new service deployment. Traditional system-oriented measures are no longer adequate to describe the reliability perceived by the user. In this paper we propose a general service reliability analysis app ...
Full textCite
Journal ArticleIEEE/ACM Transactions on Networking · July 3, 2009
This paper addresses a parameter estimation problem of Markovian arrival process (MAP). In network traffic measurement experiments, one often encounters the group data where arrival times for a group are collected as one bin. Although the group data are ob ...
Full textCite
Journal ArticleComputer Networks · June 11, 2009
Critical services in a telecommunication network should be continuously provided even when undesirable events like sabotage, natural disasters, or network failures happen. It is essential to provide virtual connections between peering nodes with certain pe ...
Full textCite
Journal ArticleACM SIGMETRICS Performance Evaluation Review · March 25, 2009
This paper discusses the modeling tool called SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator), a general hierarchical modeling tool that analyzes stochastic models of reliability, availability, performance, and perfor ...
Full textCite
Journal ArticleIEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD · January 1, 2009
The term resilience is used differently by different communities. In general engineering systems, fast recovery from a degraded system state is often termed as resilience. Computer networking community defines it as the combination of trustworthiness (depe ...
Full textCite
Journal ArticleProceedings of the 14th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2008 · December 1, 2008
We present the availability model of a high availability SIP Application Server configuration on WebSphere. Hardware, operating system and application server failures are considered. Different types of fault detectors, detection delays, failover delays, re ...
Full textCite
Journal ArticleIBM Systems Journal · December 1, 2008
The successful development and marketing of commercial high-availability systems requires the ability to evaluate the availability of systems. Specifically, one should be able to demonstrate that projected customer requirements are met, to identify availab ...
Full textCite
Journal ArticleProc. - The 3rd Int. Conf. Systems and Networks Communications, ICSNC 2008 - Includes I-CENTRIC 2008: Int. Conf. Advances in Human-Oriented and Personalized Mechanisms, Technologies, and Services · December 1, 2008
In a telecommunication network it is essential to provide virtual connections between peering nodes with performance guarantees such as minimum throughput, maximum delay or loss. Critical services in telecommunication network should be continuously provide ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · October 13, 2008
Our society is heavily dependent on a wide variety of communication services. These services must be available even when undesirable events like sabotage, natural disasters, or network failures happen. The network survivability as defined by the ANSI T1A1. ...
Full textCite
Journal ArticleIPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM · September 10, 2008
We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability assur ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · June 11, 2008
We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, and then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability a ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · June 9, 2008
As modern society becomes more and more dependent on computers and computer networks, vulnerability and downtime of these systems will significantly impact daily life from both social and economic point of view. Words like reliability and downtime are freq ...
Full textCite
Journal ArticleElectronic Transactions on Numerical Analysis · January 1, 2008
This contribution proposes a decompositional iterative method with low memory requirements for the steadystate analysis ofKronecker structured Markov chains. The Markovian system is formed by a composition of subsystems using the Kronecker sum operator for ...
Cite
Journal ArticleIEEE Transactions on Vehicular Technology · January 1, 2008
For cellular communication systems, mobility and limited radio coverage of a cell require calls to be handed over from one base station system (BSS) to another. Due to the limited bandwidth available in various cells, there is a finite probability that an ...
Full textCite
Chapter · December 14, 2007
AbstractSeveral recent studies have established that most system outages are due to software faults. Given the ever‐increasing complexity of software and the well‐developed techniques and analysis for hardware reliability, ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2007
Recently, measurement-based studies of software systems have proliferated, reflecting an increasingly empirical focus on system availability, reliability, aging, and fault tolerance. However, it is a nontrivial, error-prone, arduous, and time-consuming tas ...
Full textCite
Journal ArticleProceedings of IEEE International Symposium on High Assurance Systems Engineering · December 1, 2007
Computer and communication systems are ubiquitous and are used extensively in safety critical, life critical, and finance critical applications. Due to the excessive cost of outages, downtime is not tolerated by the users. High availability applications ar ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · November 16, 2007
In this paper, we present a variational Bayesian (VB) approach to computing the interval estimates for nonhomogeneous Poisson process (NHPP) software reliability models. This approach is an approximate method that can produce analytically tractable posteri ...
Full textCite
Journal ArticleProceedings - 2007 IEEE International Conference on Services Computing, SCC 2007 · October 18, 2007
Stochastic reliability analysis of composite services is challenging, primarily since it needs us to carefully balance accuracy of analysis and its computational complexity: Given stochastic models of service components, we often combine them and define a ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2007
This paper proposes a hierarchical modeling approach for the reliability analysis of phased-mission systems with repairable components. The components at the lower level are described by continuous time Markov chains which allow complex component failure/r ...
Full textCite
Journal ArticleIEEE Transactions on Computers · July 1, 2007
Grid computing is a newly emerging technology aimed at large-scale resource sharing and global-area collaboration. It is the next step in the evolution of parallel and distributed computing. Due to the largeness and complexity of the grid system, its perfo ...
Full textCite
Journal ArticleJournal of Systems and Software · April 1, 2007
With component-based systems becoming popular and handling diverse and critical applications, the need for their thorough evaluation has become very important. In this paper we propose an architecture-based unified hierarchical model for software performan ...
Full textCite
Journal ArticlePerformance Evaluation · March 1, 2007
This paper develops time-based rejuvenation policies to improve the performability measures of a cluster system. Three rejuvenation policies, namely standard rejuvenation, delayed rejuvenation and mixed rejuvenation, are designed to improve the cluster's p ...
Full textCite
ConferenceVALUETOOLS 2007 - 2nd International ICST Conference on Performance Evaluation Methodologies and Tools · January 1, 2007
Performance along with dependability analysis is a tremendous challenge in the design or improvement of modern complex systems. Two different classes of solution methods are generally used: analytic-numeric methods and simulation methods. As most of the li ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2007
Web services providers often commit service-level agreements (SLAs) with their customers for guaranteeing the quality of the services. These SLAs are related not just to functional attributes of the services but to performance and reliability attributes as ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · December 22, 2006
We present three algorithms for detecting the need for software rejuvenation by monitoring the changing values of a customer-affecting performance metric, such as response time. Applying these algorithms can improve the values of this customer-affecting me ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2006
Traditional approaches to software reliability modeling are black box-based; that is, the software system is considered as a whole, and only its interactions with the outside world are modeled without looking into its internal structure. The black box appr ...
Full textCite
Journal ArticleInternational Symposium on Performance Evaluation of Computer and Telecommunication Systems 2006, SPECTS'06, Part of the 2006 Summer Simulation Multiconference, SummerSim'06 · December 1, 2006
Survivability analysis measure the degree of functionality remaining in a system after failures. It consists of evaluating metrics which quantify the system performance during failure scenarios as well as in normal operation. Existing research work in this ...
Cite
Journal ArticleProceedings - 12th Pacific Rim International Symposium on Dependable Computing, PRDC 2006 · December 1, 2006
Carrier grade high availability platforms are designed to enable the development and deployment of highly available services in the telecommunications industry. In order to build-in high availability and compare availabilities that differ in the sixth deci ...
Full textCite
Journal ArticleProceedings - 12th Pacific Rim International Symposium on Dependable Computing, PRDC 2006 · December 1, 2006
Recently, measurement based studies of software systems proliferated, reflecting an increasingly empirical focus on system availability, reliability, aging and fault tolerance. However, it is a non-trivial, error-prone, arduous, and time-consuming task eve ...
Full textCite
Journal ArticleProceedings - International Computer Software and Applications Conference · December 1, 2006
Performance engineering is an important activity for software architects and designers. Assessment and tuning of performance can help to make key changes in the system, especially if done early in its development. In this paper, we present a tool for the p ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2006
High reliability and performance are vital for software systems handling diverse mission critical applications. Such software systems are usually component based and may possess multiple levels of fault recovery. A number of parameters, including the softw ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2006
Several recent studies have reported & examined the phenomenon that long-running software systems show an increasing failure rate and/or a progressive degradation of their performance. Causes of this phenomenon, which has been referred to as "software agin ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · September 1, 2006
In this paper, a new soft handoff scheme for CDMA cellular systems is proposed and investigated. It is pointed out that some handoff calls unnecessarily occupy multiple channels with little contribution to the performance of handoffs in IS95/CDMA2000-based ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · June 1, 2006
A large number of software reliability growth models have been proposed to analyse the reliability of a software application based on the failure data collected during the testing phase of the application. To ensure analytical tractability, most of these m ...
Full textCite
Book · April 21, 2006
Critically acclaimed text for computer performance analysis--now in its second edition The Second Edition of this now-classic text provides a current and thorough treatment of queueing systems, queueing networks, continuous and discrete-time Markov chains, ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · March 1, 2006
This paper investigates the features of a cellular geometry in code-division multiple-access (CDMA) systems with soft handoff and distinguishes controlling area of a cell from coverage area of a cell. Some important characteristics of the cellular configur ...
Full textCite
Journal ArticleInternational Journal of Performability Engineering · January 1, 2006
In this paper, we present a general survivability quantification approach that is applicable to a wide range of system architectures, applications, failure/recovery behaviors, and metrics. We show how this approach can be applied to derive survivability me ...
Cite
Journal ArticleProceedings - International Computer Software and Applications Conference · December 1, 2005
In this paper, we describe three different state space models for analyzing the security of a software system. In the first part of this paper, we utilize a semi-Markov Process (SMP) to model the transitions between the security states of an abstract softw ...
Full textCite
Journal ArticleProceedings - Winter Simulation Conference · December 1, 2005
Cellular networks are gradually shifting from voice only to voice and data due to increased demand for WWW, FTP and multi-media messaging. This has substantially increased the volume of cellular data traffic. Schemes have been proposed for co-existence and ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · November 9, 2005
Many software reliability growth models assume that the time to next failure may be infinite; i.e., there is a chance that no failure will occur at all. For most software products this is too good to be true even after the testing phase. Moreover, if a non ...
Full textCite
Journal ArticleReliability Engineering and System Safety · October 1, 2005
The semi-Markov decision model is a powerful tool in analyzing sequential decision processes with random decision epochs. In this paper, we have built the semi-Markov decision process (SMDP) for the maintenance policy optimization of condition-based preven ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2005
We present a hierarchical model for the analysis of proactive fault management in the presence of system resource leaks. At the low level of the model hierarchy is a degradation model in which we use a nonhomogeneous Markov chain to establish an explicit c ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · September 1, 2005
Mean time to failure (MTTF) is an important reliability measure. Previous research is mainly concerned with the MTTF computation of coherent systems. In this paper, we derive equations to calculate the steady-state MTTF for noncoherent systems. Based on th ...
Full textCite
Journal ArticleReliability Engineering and System Safety · January 1, 2005
A two-level rejuvenation policy for software systems with degradation process is studied. Both full restarts and partial restarts are considered in this rejuvenation strategy. A semi-Markov process model is constructed, and based on its closed-form solutio ...
Full textCite
Journal ArticleOpsearch (India) · 2005
All non-homogeneous Poisson process (NHPP) software reliability growth models of the finite failures category share the property that every time to failure distribution is defective. The reason for this phenomenon is the fact that according to these models ...
Cite
Journal ArticleComputer Communications · 2005
In this paper, we propose a high availability design of a Cable Modem Termination System (CMTS) clusters system based on the software rejuvenation technique. This proactive system maintenance technique is aimed to reduce system outages and the associated d ...
Full textLink to itemCite
Journal ArticleProceedings of the Fifth International Workshop on Software and Performance, WOSP'05 · January 1, 2005
With software systems becoming more complex, and handling diverse and critical applications, the need for their thorough evaluation has become ever more important at each phase of software development. With the prevalent use of component-based design, the ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2005
Service availability is an important consideration when carriers deploy new, packet-based services. In this paper we define the service availability based on user behavior, and derive formulas to compute service availability starting with the user behavior ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2005
Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaus ...
Full textCite
Journal ArticleInternational Conference on Information Technology: Coding and Computing, ITCC · January 1, 2005
Software coding practices, in the interest of efficiency, often ignore to enforce strict bound checking on buffers, arrays and pointers. This results in software code that is more vulnerable to security intrusions exploiting buffer overflow vulnerabilities ...
Full textCite
Journal ArticleLecture Notes in Computer Science · January 1, 2005
The architecture of a software system is the highest level of abstraction whereupon useful analysis of system properties is possible. Hence, performance analysis at this level can be useful for assessing whether a proposed architecture can meet the desired ...
Full textCite
Journal ArticleJournal of High Speed Networks · December 29, 2004
Increasing deployment of computer systems in critical applications has made study and quantifiable analysis of the security aspects of these systems an important issue. Security quantification analysis can either be done by logging large amounts of operati ...
Cite
Journal ArticlePerformance Evaluation · December 1, 2004
Conventional approaches to analyze the behavior of software applications are black box based, that is, the software application is treated as a whole and only its interactions with the outside world are modeled. The black box approaches ignore information ...
Full textCite
Journal ArticleProceedings - International Symposium on Software Reliability Engineering, ISSRE · December 1, 2004
The telecommunications industry has achieved high reliability and availability for telephone service over decades of development. However, the current design does not aim at providing service survivability when a local switching office fails due to catastr ...
Cite
Journal ArticleProceedings - Asia-Pacific Software Engineering Conference, APSEC · December 1, 2004
In general, the software reliability models based on the non-homogeneous Poisson processes (NHPPs) are quite popular to assess quantitatively the software reliability and its related dependability measures. Nevertheless, it is not so easy to select the bes ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · October 1, 2004
As the new generation high-availability commercial computer systems incorporate deferred repair service strategies, steady-state availability metrics may no longer reflect reality. Transient solution of availability models for such systems to calculate int ...
Cite
Journal ArticleIEEE Transactions on Wireless Communications · September 1, 2004
In this paper, an optimal training equalization for wireless communication is proposed and analyzed. By our scheme, the training of the equalizer is carried out periodically, with the training interval optimized for a maximal channel utilization. A closed- ...
Full textCite
Journal ArticleProceedings - IEEE Pacific Rim International Symposium on Dependable Computing · June 15, 2004
This paper analyzes two software rejuvenation policies of cluster server systems under varying workload, called fixed rejuvenation and delayed rejuvenation. In order to achieve a higher average throughput, we propose the delayed rejuvenation policy, which ...
Cite
Journal ArticlePerformance Evaluation · May 1, 2004
Capacity-on-demand is the key concept in multiplexing bursty mobile data traffic over wireless links featuring limited bandwidth. This scheme maintains a connection for a mobile only when it has data to transfer and allows quick release of radio resource w ...
Full textCite
Journal ArticlePerformance Evaluation · March 1, 2004
Complex software and network based information server systems may exhibit failures. Quite often, such failures may not be accidental. Instead some failures may be caused by deliberate security intrusions with the intent ranging from simple mischief, theft ...
Full textCite
Journal ArticleIEICE Transactions on Information and Systems · January 1, 2004
Software rejuvenation is a preventive and proactive solution that is particularly useful for counteracting the phenomenon of software aging. In this paper, we consider both the periodic and non-periodic software rejuvenation policies under different depend ...
Cite
Journal ArticleSoftware Quality Journal · January 1, 2004
Software reliability is an important metric that quantifies the quality of a software product and is inversely related to the residual number of faults in the system. Fault removal is a critical process in achieving desired level of quality before software ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2004
The development of techniques for quantitative, model-based evaluation of computer system dependability has a long and rich history. A wide array of model-based evaluation techniques is now available, ranging from combinatorial methods, which are useful fo ...
Full textCite
ConferenceIFIP Advances in Information and Communication Technology · January 1, 2004
Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the welldeveloped techniques and analysis for hardware reliability, this trend is not likely to change in the ...
Full textCite
Journal ArticleIEEE Transactions on Computers · December 1, 2003
In this paper, a new algorithm based on Binary Decision Diagram (BDD) for the analysis of a system with multistate components is proposed. Each state of a multistate component is represented by a Boolean variable, and a multistate system is represented by ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · December 1, 2003
We present a framework of adaptive estimation and rejuvenation of software system performance in the presence of aging sources. The framework specifies that a degradation model not only describe an aging process but also enable the adaptation of model-base ...
Cite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · December 1, 2003
The presence of physical obstacles and radio interference results in the so called "shadow regions" in wireless networks. When a mobile station roams into a shadow region, it loses its network connectivity. In cellular networks, in order to minimize the co ...
Cite
Journal ArticleProceedings of the American Control Conference · November 6, 2003
This paper presents software performance analysis using a finite state automaton model. A signed real measure of formal languages has been used for quantitative evaluation of the software. This paper extends an earlier model based on a discrete time Markov ...
Cite
Journal ArticleIEEE Transactions on Vehicular Technology · November 1, 2003
Performance modeling of the contention-based reservation protocol in general packet radio service (GPRS)/enhanced general packet radio service (EGPRS) under bursty traffic is practically useful in system design. Instead of using discrete event simulation, ...
Full textCite
Journal ArticleComputer Communications · September 22, 2003
Handoff is an important issue in cellular mobile telephone systems. Recently, studies that question the validity of the assumption of handoff arrivals being Poissonian have appeared in the literature. The reasoning behind this claim can be summarized as fo ...
Full textCite
Journal ArticleInternational Journal of Reliability, Quality and Safety Engineering · September 1, 2003
Preventive maintenance is applied to improve the system availability or decrease the operational cost. This paper addresses the optimal preventive maintenance problem for multi-state deteriorating systems, where the system experiences multiple stages of pe ...
Full textCite
Journal ArticleInternational Journal of Communication Systems · August 1, 2003
The high expectations of performance and availability for wireless mobile systems has presented great challenges in the modelling and design of fault tolerant wireless systems. The proper modelling methodology to study the degradation of such systems is so ...
Full textCite
Journal ArticleIEEE International Conference on Communications · July 18, 2003
We propose to use Markov regenerative process (MRGP) models to study the availability of Internet-based services perceived by a Web user, which capture the interactions between the service facility and the user. The necessity of the sophisticated MRGP mode ...
Cite
Journal ArticleIEEE Transactions on Reliability · March 1, 2003
Telecommunication systems are large and complex, consisting of multiple intelligent modules in shelves, multiple shelves in frames, and multiple frames to compose a single network element. In the availability and performability analysis of such a complex s ...
Full textCite
ConferenceFoundations of Intrusion Tolerant Systems, OASIS 2003 · January 1, 2003
This paper presents a intrusion tolerant architecture for distributed services, especially COTS servers. An intrusion tolerant system assumes that attacks will happen, and some will be successful. However, a wide range of mission critical applications need ...
Full textCite
Journal ArticleProceedings of the IEEE · January 1, 2003
Real-time systems are an important class of process control systems that need to respond to events under time constraints, or deadlines. Such systems may also be required to deliver service in spite of hardware or software faults in their components. This ...
Full textCite
Journal ArticleComputers and Mathematics with Applications · January 1, 2003
With growing emphasis on reuse, the software development process moves toward component-based software design. As a result, there is a need for modeling approaches that are capable of considering the architecture of the software made out of components. Thi ...
Full textCite
Journal ArticleProceedings of the ACM Workshop on Survivable and Self-Regenerative Systems · January 1, 2003
Security is an important QoS attribute for characterizing intrusion tolerant computing systems. Frequently however, the security of computing systems is assessed in a qualitative manner based on the presence and absence of certain functional characteristic ...
Full textCite
Journal ArticleProceedings of the Annual Reliability and Maintainability Symposium · January 1, 2003
An overview is given of novel techniques for computing importance measures in state space dependability models. Specifically, reward functions in a Markov reward model (MRM) are utilized for this purpose, in contrast to the common method of computing impor ...
Cite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2003
Software aging often affects the performance of a software system and eventually causes it to fail. A novel approach to handle transient software failures is called software rejuvenation which can be regarded as a preventive and proactive solution that is ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003
Software architectural choices have a profound influence on the quality attributes supported by a system. Architecture analysis can be used to evaluate the influence of design decisions on important quality attributes such as maintainability, performance a ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · December 17, 2002
Recently, the phenomenon of "software aging", one in which the state of a software system gradually degrades with time and eventually leads to performance degradation or crash/hang failure, has been reported. Preventive maintenance of operational software ...
Full textCite
Journal ArticleInformation Sciences · December 1, 2002
In this paper, we report our recent work on closed form solutions to the blocking and dropping probability in wireless cellular networks with handoff. First, we develop a performance model of a cell in a wireless network where the effect of handoff arrival ...
Full textCite
Journal ArticleProceedings of the Joint Conference on Information Sciences · December 1, 2002
In this paper, we report our recent work on closed form solutions to the blocking and dropping probability in wireless cellular networks with handoff. First, we develop a performance model of a cell in a wireless network where the effect of handoff arrival ...
Cite
Journal ArticleProceedings of the IEEE Conference on Decision and Control · December 1, 2002
This paper addresses the outage problem and its mitigation for quality of service (QoS) of wireless networks with fading channels. At first, we set up a continuous time Markov chain (CTMC) model that includes various source states and channel outage states ...
Cite
Journal ArticleProceedings of the International Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems · December 1, 2002
Network survivability reflects the ability of a network to continue to function during and after failures. Our purpose in this paper is to propose a quantitative approach to evaluate network survivability. We perceive the network survivability as a composi ...
Cite
Journal ArticleProceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002
A tool called Software Reliability Estimation and Prediction Tool (SREPT) that seeks to address the limitation given in other tools is presented. Unlike most models that assume instantaneous and perfect debugging, SREPT allows the users to analyze the effe ...
Full textCite
Journal ArticleProceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002
The NASA Remote Exploration and Experimentation (REE) Project, managed by the Jet Propulsion Laboratory, has the vision of bringing commercial supercomputing technology into space, in a form which meets the demanding environmental requirements, to enable a ...
Cite
Journal ArticleProceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002
Quite often failures in network based services and server systems may not be accidental, but rather caused by deliberate security intrusions. We would like such systems to either completely preclude the possibility of a security intrusion or design them to ...
Full textCite
Journal ArticleProceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002
In this paper, we characterize a broad class C of prefetching algorithms and prove that, for any prefetching algorithm in this class, its total elapsed time is no more than twice the smallest possible total elapsed time. This result provides a performance ...
Cite
Journal ArticleProceedings of the 2002 International Conference on Dependable Systems and Networks · December 1, 2002
SHARPE is a well known package in the field of reliability and performability, used in universities as well as in companies. It is believed that SHARPE is a useful modeler's "toolchest" because it contains support for multiple model types and provides flex ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · June 1, 2002
This paper studies the steady-state availability of systems with times to outages and recoveries that are generally distributed. Availability bounds are derived for systems with limited information about the distributions. Also investigated are the applica ...
Full textCite
Journal ArticleComputer Communications · May 1, 2002
Call admission control (CAC) algorithms that reduce dropped calls in code-division multiple access (CDMA) cellular systems are discussed in this paper. The capacity of a CDMA system is confined by the interference of users from both inside and outside of t ...
Full textCite
Conference28th International Computer Measurement Group Conference, CMG 2002 · January 1, 2002
From an end user’s point of view, too short a Webserver timeout implies too many forced logouts, and too long a timeout duration poses a higher security risk to users’ sensitive data. We propose cost functions to select the timeout value, which are based o ...
Cite
Journal ArticlePerformance Evaluation · 2002
In this paper, the analysis of second-order stochastic fluid models, where the fluid rate is dependent on the fluid level, is addressed. The boundary conditions are presented for the fluid models under consideration, which have extended previous work with ...
Full textLink to itemCite
Journal ArticleApplied Numerical Mathematics · January 1, 2002
We propose a methodology aimed at automating the software development of fast discrete transforms for N-body problems. The methodology starts with a representation of the transform matrix in compact form. Then, two translation phases are applied. One trans ...
Full textCite
Journal ArticleReliab. Eng. Syst. Saf. (UK) · 2002
Preventive maintenance is applied to improve the device availability or decrease the repair costs when the device failures are in deterioration (or aging) phase. Preventive maintenance can be made more efficient by periodic monitoring wherein the state of ...
Cite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2002
In this paper, we consider a new stochastic model for a file recovery action with checkpointing when the system failure occurs according to a homogeneous Poisson process. The present checkpoint model strongly depends on the system age and is quite differen ...
Cite
Journal ArticleIEEE International Conference on Communications · January 1, 2002
In this paper, an optimal training equalization for wireless communication is proposed and analyzed. By our scheme, the training of the equalizer is carried out periodically, with the training interval optimized for a maximal channel utilization. A closed- ...
Cite
Journal ArticleProceedings of the Annual Reliability and Maintainability Symposium · January 1, 2002
In this paper we develop analytical models for the study of the dependability characteristics of systems with uninterruptible power supply (UPS) units. Dependability of systems with UPS cannot be modeled exactly using the prevalent Markov modeling approach ...
Cite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2002
Prevalent approaches to characterize the behavior of monolithic applications are inappropriate to model modern software systems which are heterogeneous, and are built using a combination of components picked off the shelf, those developed in-house and thos ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2002
Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the well-developed techniques and analysis for hardware reliability, this trend is not likely to change in the ...
Full textCite
ConferenceProceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS · January 1, 2002
Messaging services are a useful component in distributed systems that require scalable dissemination of messages (events) from suppliers to consumers. These services decouple suppliers and consumers, and take care of client registration and message propaga ...
Full textCite
ConferenceISESE 2002 - Proceedings, 2002 International Symposium on Empirical Software Engineering · January 1, 2002
A number of recent studies have reported the phenomenon of "software aging", characterized by progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of ...
Full textCite
ConferenceProceedings - International Symposium on Software Reliability Engineering, ISSRE · January 1, 2002
In order to reduce system outages and the associated downtime cost caused by the "software aging" phenomenon, we propose to use software rejuvenation as a proactive system maintenance technique deployed in a CMTS (Cable Modem Termination System) cluster sy ...
Full textCite
ConferenceProceedings - International Conference on Computer Communications and Networks, ICCCN · January 1, 2002
Spatial reuse protocol (SRP) is a media access control (MAC)-layer protocol that operates over a double counter-rotating ring network topology. SRP is designed to enhance the SONET network so that it can handle data traffic more efficiently. We study the a ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 2001
This article gives the detailed mathematical results on the hypergeometric distribution software reliability model (HGDSRM) proposed by Tohma et al. [IEEE Trans. Software Eng. (1989, 1991)]. In the above papers, Tohma et al. developed the HGDSRM as a discr ...
Cite
Journal ArticleComputer Journal · December 1, 2001
Software rejuvenation is a preventive maintenance technique that has been extensively studied in recent literature. In this paper, we extend the classical result by Huang et al. (1995), and in addition propose a modified stochastic model to generate the so ...
Full textCite
Journal ArticleElectronics Letters · November 22, 2001
The problem of transmission control protocol (TCP) traffic with random early detection (RED) is addressed. With the formulation of stochastic differential equation (SDE), an explicit expression for the relation between RED parameters and network parameters ...
Full textCite
Journal ArticleComputer Communications · October 1, 2001
In this paper, we develop a concise performance model of partial packet discard (PPD) and early packet discard (EPD) schemes in ATM switches. We study the performance of PPD and EPD with heterogeneous traffic sources. The sources included Poisson, and ON-O ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · September 1, 2001
With the increasing popularity of wireless communication systems, customers are expecting the same level of service, availability, and performance from the wireless communication networks as the traditional wire-line networks. Traditional pure performance ...
Full textCite
Journal ArticleComputer Communications · July 15, 2001
A single base repeater failure in time division multiple access (TDMA) wireless systems causes all active calls on this base repeater to be dropped. In order to increase system end-to-end availability, a multiple channel recovery method for TDMA wireless s ...
Full textCite
Journal ArticlePerformance Evaluation · July 1, 2001
With the growing emphasis on reuse, software development process moves toward component-based software design. As a result, there is a need for modeling approaches that are capable of considering the architecture of the software and estimating the reliabil ...
Full textCite
Book · June 18, 2001
Performability modelling and evaluation brings together two disciplines that have long been treated separately in different communities: computer and communication system performance evaluation and system reliability and availability ... ...
Cite
Journal ArticleIEEE/ACM Transactions on Networking · June 1, 2001
In this paper, we develop performance models of the Broadcast and Unknown Server (BUS) in the LANE. The traffic on the BUS is divided into two classes: the broadcast and multicast traffic, and the unicast relay flow. The broadcast and multicast traffic is ...
Full textCite
Journal ArticleIEEE Transactions on Vehicular Technology · May 1, 2001
In this paper, we develop a performance model of a cell in a wireless communication network where the effect of handoff arrival and the use of guard channels is included. Fast recursive formulas for the loss probabilities of new calls and handoff calls are ...
Full textCite
ConferenceEUROCON 2001 - International Conference on Trends in Communications, Proceedings · January 1, 2001
Soft handoff in the CDMA cellular system is analyzed. To improve performance degradation due to channel resource shortage during soft handoff, we propose a new scheme which converts channels occupied by some pseudo-handoff calls to new handoff calls. Stoch ...
Full textCite
ConferenceProceedings - DARPA Information Survivability Conference and Exposition II, DISCEX 2001 · January 1, 2001
Intrusion detection and response research has so far mostly concentrated on known and well-defined attacks. We believe that this narrow focus of attacks accounts for both the successes and limitation of commercial intrusion detection systems (IDS). Intrusi ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 2001
Several recent studies have reported the phenomenon of "software aging", one in which the state of a software system degrades with time. This may eventually lead to performance degradation of the software or crash/hang failure or both. "Software rejuvenati ...
Full textCite
Journal ArticleDiscrete Event Dynamic Systems: Theory and Applications · 2001
Hybrid Systems are models of interacting digital and continuous devices with applications in the control of aircraft, computers, or modern cars for instance. Concurrently, Fluid Stochastic Petri Nets (FSPNs) have been introduced as an extension of stochast ...
Full textLink to itemCite
Journal ArticleIBM Journal of Research and Development · January 1, 2001
In response to the strong desire of customers to be provided with advance notice of unplanned outages, techniques were developed that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reach ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · January 1, 2001
Many architecture-based software reliability models have been proposed in the past without any attempt to establish a relationship among them. The aim of this paper is to fill this gap. First, the unifying structural properties of the models are exhibited ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2001
As CORBA (Common Object Request Broker Architecture) gains popularity as a standard for portable, distributed, object-oriented computing, the need for a CORBA messaging solution is being increasingly felt. This led the Object Management Group (OMG) to spec ...
Full textCite
Journal ArticleProceedings of the Annual Reliability and Maintainability Symposium · January 1, 2001
In reliability analysis of computer systems, models such as fault trees, Markov chains, and stochastic Petri nets(SPN) are built to evaluate or predict the reliability of the system. In general, the parameters in these models are usually obtained from fiel ...
Cite
ConferenceProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2001
Preventive maintenance is applied to improve the system availability or decrease the operational cost. In this paper the preventive maintenance with generally distributed parameters are discussed, and the steady-state solution is obtained by solving the un ...
Full textCite
ConferenceProceedings - 3rd International Symposium on Distributed Objects and Applications, DOA 2001 · January 1, 2001
With the growing popularity of the CORBA architecture as a distributed computing infrastructure standard, the need for a reliable CORBA messaging solution is being increasingly felt. The Event Service, which is the first such solution, provides inadequate ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 2000
Perhaps the most stringent restriction in most software reliability models is the assumption of statistical independence among successive software failures. Our research was motivated by the fact that although there are practical situations in which this a ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 2000
The GMDH (group method of data handling) network is an adaptive learning machine based on the principle of heuristic self-organization. In this paper, we apply the GMDH networks to predict software reliability in testing phase. Three kinds of networks: the ...
Cite
Journal ArticleConference Record / IEEE Global Telecommunications Conference · December 1, 2000
Traditional pure performance model that ignores failure and recovery but considers resource contention generally overestimates the system's ability to perform a certain job. On the other hand, pure availability analysis tends to be too conservative since p ...
Cite
Journal ArticlePerformance Evaluation · January 1, 2000
Several tools have been developed for the estimation of software reliability. However, they are highly specialized in the approaches they implement and the particular phase of the software life-cycle in which they are applicable. There is an increasing nee ...
Full textCite
Journal ArticleEuropean Transactions on Telecommunications · January 1, 2000
With the increasing penetration of wireless communications systems, customers are expecting the same level of service, reliability and performance from the wireless communication systems as the traditional wire-line networks. Due to the dynamic environment ...
Full textCite
Journal ArticleProceedings of the IEEE Symposium on Reliable Distributed Systems · January 1, 2000
The Event service is the earliest CORBA solution to the message queue model of communication in distributed systems. Typical implementations however suffer from the lack of event delivery guarantees. The loss of messages is aggravated in the presence of bu ...
Cite
Journal ArticleProceedings of the IEEE Annual Simulation Symposium · January 1, 2000
Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of 'software aging', one in which the state of the software system degrades with time, has been reported. To counteract this phenomenon, a proactive approac ...
Cite
Journal ArticleProceedings - IEEE INFOCOM · January 1, 2000
Call admission control algorithms that reduce dropped calls in CDMA cellular systems are discussed in this paper. The capacity of a CDMA system is confined by interference of users from both inside and outside of the target cell. Earlier algorithms for cal ...
Cite
ConferenceProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2000
Since the early 1970's a number of models have been proposed for estimating software reliability. However, the realism of many of the underlying assumptions and the applicability of these models continue to be questioned. Our research work was motivated by ...
Full textCite
ConferenceProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2000
In this paper, we extend the classical result by Huang, Kintala, Kolettis and Fulton (1995), and in addition propose a modified stochastic model to determine the software rejuvenation schedule. More precisely, the software rejuvenation models are formulate ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000
In this paper we study the suitability of the CORBA Event Service as a reliable message delivery mechanism. We first show that products built to the CORBA Event Service specification will not guarantee against loss of messages or guarantee order. This is n ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000
Although several tools have been developed for the estima-tion of software reliability, they are highly specialized in the approaches they implement and the particular phase of the software lifecycle in which they are applicable. Also the conventional tech ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000
The SHARPE package, Symbolic Hierarchical Automated Reliability and Performance Evaluator, is now 13 years old. A well known package in the field of reliability and performability, SHARPE is used in universities as well as in companies. Many important chan ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000
Stochastic Petri Net Package (SPNP) is a software package whose goal is to compute performance, availability or performability measures from Stochastic Petri Nets (SPN) and Fluid Stochastic Petri nets (FSPN). This software can use either analytic numeric m ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000Full textCite
ConferenceProceedings of IEEE International Symposium on High Assurance Systems Engineering · January 1, 2000
Software rejuvenation is a preventive maintenance technique that has been extensively studied in the recent literature. In this paper we extend the classical result by Huang et al. (1995), and in addition propose a modified stochastic model to generate the ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · December 1, 1999
This paper presents a new algorithm (PMS-BDD) based on the binary decision diagram (BDD) for reliability analysis of phased-mission systems (PMS). PMS-BDD uses phase algebra to deal with the dependence across the phases, and a new BDD operation to incorpor ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999
Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of `software aging', one in which the state of the software system degrades with time, has been reported. The primary causes of this degradation are the exh ...
Cite
Journal ArticleIEEE Vehicular Technology Conference · December 1, 1999
A single base repeater failure in TDMA wireless systems causes all active calls on this base repeater to be dropped. In order to increase system end-to-end availability, an RF channel recovery method for TDMA wireless systems is proposed in this paper. By ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999
Perhaps the most stringent restriction that is present in most software reliability models is the assumption of independence among successive software failures. Our research was motivated by the fact that although there are practical situations in which th ...
Cite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1999
The software reliability growth models (such as NHPP models) are frequently used in software reliability prediction. Estimation of parameters in these models is often done by point estimation. However, some numerical problems arise and make the actual comp ...
Cite
Journal ArticleComputer Communications · September 15, 1999
In this paper, the effect of Web caching on network planning, in the sense of bandwidth computation for the access link interconnecting the ISP's subnet with the Internet, is studied by means of simulations. The latency of a browser retrieving files is stu ...
Full textCite
Journal ArticleComputer Communications · June 15, 1999
The performance of prioritized distributed queue dual bus (DQDB) metropolitan area network (MAN) under bursty traffic environment is studied in this article. The tagged node model is adopted to simplify the analysis. The processes of the packet arrivals to ...
Full textCite
Journal ArticleIEEE Wireless Communications and Networking Conference, WCNC · January 1, 1999
An RF channel at a cell is assigned to a call during the call set-up process. The channel is dedicated to the subscriber until the call is terminated (normal termination) or the subscriber leaves the cell (handoff). However, the RF channel may fail due to ...
Full textCite
ConferenceDependable Computing for Critical Applications 7 · January 1, 1999
We focus on analytical modeling for the dependability evaluation of phased-mission systems. Because of their dynamic behavior, systems showing a phased behavior offer challenges in modeling. We propose the modeling and evaluation of phased-mission system d ...
Full textCite
ConferenceIEEE International Conference on Communications · January 1, 1999
Network management provides the central nervous system for the networks of telecommunications providers. A telco's network management system (NMS) needs to support uninterrupted management functionality of complex networks. The reliability of such systems ...
Full textCite
Journal ArticleReliability Engineering and System Safety · 1999
The purpose of this paper is to describe an efficient Boolean algebraic algorithm that provides exact solution to the unreliability of a multi-phase mission system where the configurations are described through fault trees. The algorithm extends and improv ...
Full textLink to itemCite
Journal ArticlePerformance Evaluation · January 1, 1999
In a distributed process control system, information about the behavior of physical processes is usually collected and stored in a real-time database which can be remotely accessed by human operators. In this paper we propose an analytic approach to comput ...
Full textCite
Journal ArticleReliability Engineering and System Safety · January 1, 1999
In this paper, we propose an integrated reliability/availability modeling and analysis environment suitable for heterogeneous hierarchical system analysis. A key component of this environment is a high level system specification and input language which ac ...
Full textCite
Journal ArticleAnnals of Software Engineering · January 1, 1999
The past 20 years have seen the formulation of numerous analytical software reliability models for estimating the reliability growth of a software product. The predictions obtained by applying these models tend to be optimistic due to the inaccuracies in t ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · January 1, 1999
The purpose of this paper is to describe a method for the simulation of the recently introduced fluid stochastic Petri nets. Since such nets result in rather complex system of partial differential equations, numerical solution becomes a formidable task. Be ...
Full textCite
Journal ArticleIEEE VTS 50th Vehicular Technology Conference, VTC 1999-Fall · January 1, 1999
Following Mandayam et al., we define outage events as the channel being attenuated for at least a deterministic period of time, τm. Compared with continuous time Markov chain or discrete time Markov chain, a semi-Markov process (SMP) is general enough that ...
Cite
Journal ArticleProceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1999
Process replication is provided as the central mechanism for application level software fault tolerance in SwiFT and DOORS. These technologies, implemented as reusable software modules, support cold and warm schemes of passive replication. The choice of a ...
Cite
Journal ArticleProceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1999
In this paper, a new algorithm based on Binary Decision Diagrams (BDD) for dependability analysis of distributed computer systems (DCS) with imperfect coverage is proposed. Minimum file spanning trees (MFST) are generated and stored via BDD manipulation. B ...
Cite
ConferenceProceedings - 1999 Pacific Rim International Symposium on Dependable Computing, PRDC 1999 · January 1, 1999
In this paper, we compare the availability and performance of a wireless TDMA system with and without automatic protection switching. Stochastic reward net models are constructed and solved by SPNP (Stochastic Petri Net Package). Hierarchical decomposition ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1999
In this paper we present a new modelling approach for dependability evaluation and sensitivity analysis of Scheduled Maintenance Systems, based on a Deterministic and Stochastic Petri Net approach. The DSPN approach offers significant advantages in terms o ...
Full textCite
ConferenceProceedings - 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology, ASSET 1999 · January 1, 1999
An important step towards effective software maintenance is to locate the code relevant to a particular feature. We report a study applying an execution slice-based technique to a reliability and performance evaluator to identify the code which is unique t ...
Full textCite
Journal ArticleIEEE Transactions on Computers · December 1, 1998
Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software "aging." However, it incurs some overhead. The necessity to do preventive maintenance, not ...
Full textCite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998
The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it's crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management ...
Cite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998
Software reliability measurement problem can be approached by obtaining the estimates of the residual number of faults in the software. Traditional black-box based approaches to software reliability modeling assume that the debugging process is instantaneo ...
Cite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1998
Two case studies, one of a terminating application, and the other of a real-time application with feedback control, are presented to illustrate the flexibility offered by discrete-event simulation to analyze complex systems. Data from these studies confirm ...
Cite
Journal ArticleIEEE Transactions on Reliability · December 1, 1998
This paper presents a simpler and more efficient algorithm (I_VT), based on the one proposed by Veeraraghavan & Trivedi (VT), to calculate system reliability using 'sum of disjoint products' and 'multiple variable inversion' (MVI) techniques. A proposition ...
Full textCite
Journal ArticleIEEE Internet Computing · July 1, 1998
Java can be used to create a network computing platform that lets users share applications not specifically devised for the Web. The authors used one such platform to port an existing tool and develop a new application. ...
Full textCite
Journal ArticleEuropean Journal of Operational Research · February 16, 1998
In this paper we introduce a new class of stochastic Petri nets in which one or more places can hold fluid rather than discrete tokens. We define a class of fluid stochastic Petri nets in such a way that the discrete and continuous portions may affect each ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 1998
Petri nets represent a powerful paradigm for modeling parallel and distributed systems. Parallelism and resource contention can easily be captured and time can be included for the analysis of system dynamic behavior. Most popular stochastic Petri nets assu ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1998
Stochastic Petri nets have been used to analyze the performance and reliability of complex systems comprising concurrency and synchronization. Various extensions have been proposed in literature in order to broaden their field of application to an increasi ...
Full textCite
Journal ArticleJournal of Circuits, Systems and Computers · January 1, 1998
Analytical modeling plays a crucial role in the analysis and design of computer systems. Stochastic Petri Nets represent a powerful paradigm, widely used for such modeling in the context of dependability, performance and performability. Many structural and ...
Full textCite
Journal ArticleMicroelectronics Reliability · January 1, 1998
Energy management system (EMS) computer architectures have changed significantly over the recent past increasing the difficulty and the need for a priori assessment of system performance and dependability. The old practice based on measurements is no longe ...
Full textCite
Journal ArticleMicroelectronics Reliability · January 1, 1998
Mean time to failure (MTTF) is one of the most frequently used dependability measures in practice. By convention, MTTF is the expected time for a system to reach any one of the failure states. For some systems, however, the mean time to absorb to a subset ...
Full textCite
ConferenceProceedings - 3rd IEEE International High-Assurance Systems Engineering Symposium, HASE 1998 · January 1, 1998
The finite-failure non-homogeneous Poisson process (NHPP) models proposed in the literature exhibit either constant, monotonic increasing or monotonic decreasing failure occurrence rates per fault, and are inadequate to describe the failure processes under ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1998
Several tools have been developed for the estimation of soft- ware reliability. However, they are highly specialized in the approaches they implement and the particular phase of the software life-cycle in which they are applicable. There is an increasing n ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1998
An improved algorithm based on the one proposed by Veeraraghavan and Trivedi(VT) to calculate system reliability using sum of disjoint products (SDP) and multiple variable inversion (MVI) techniques is presented. We compare the improved algorithm with seve ...
Full textCite
ConferenceICUPC 1998 - IEEE 1998 International Conference on Universal Personal Communications, Conference Proceedings · January 1, 1998
We propose and compare three channel recovery schemes for fixed channel assignment. In Scheme I, a failed channel is switched by an idle channel whenever it is available. In Scheme II, the switching strategy is employed only after an attempt to restore the ...
Full textCite
ConferenceProceedings - 1998 IEEE Workshop on Application-Specific Software Engineering and Technology, ASSET 1998 · January 1, 1998
Effective and accurate reliability modeling requires the collection of comprehensive, homogeneous, and consistent data sets. Failure data required for software reliability modeling is difficult to collect, and even the available data tends to be noisy, dis ...
Full textCite
Journal ArticleProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS · December 1, 1997
Fault tolerance is a survival attribute of complex computer systems and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability and performance evaluation of hardware ...
Cite
Journal ArticleProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS · December 1, 1997
Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as processor allows a short duration f ...
Cite
Journal ArticleProceedings of the International Conference on Computer Communications and Networks, ICCCN · December 1, 1997
A combined performance and dependability (called performability) model for dealing with handoff calls is introduced. Stochastic reward nets (SRNs) are used for this purpose. An SRN model of channel assignment is developed and analyzed. The method of phase ...
Cite
Journal ArticleJournal of Network and Systems Management · January 1, 1997
Detection and restoration times are often ignored when modeling network reliability. In this paper, we develop Markov Regenerative Reward Models (MRRM) to capture the effects of detection and restoration phases of network recovery. States of the MRRM repre ...
Full textCite
Journal ArticleIEEE International Conference on Communications · January 1, 1997
As switched networks providing services to end users become more commonplace, integral components of these networks must have an increasing level of dependability. One area of interest is determining the optimal number of network servers required for a swi ...
Cite
Journal ArticleTelecommunication Systems · January 1, 1997
The B-ISDN will carry a variety of traffic types: the Variable Bit Rate traffic (VBR), of which compressed video is an example, Continuous Bit Rate traffic (CBR), of which telemetry is an example, Data traffic, and Available Bit Rate traffic (ABR) that rep ...
Full textCite
Journal ArticleCOMPASS - Proceedings of the Annual Conference on Computer Assurance · January 1, 1997
Software reliability is an important metric that quantifies the quality of the software product and is inversely related to the number of unrepaired faults in the system. Fault removal is a critical process in achieving desired level of quality before soft ...
Cite
Journal ArticleProceedings of the High-Assurance Systems Engineering Workshop · January 1, 1997
High-assurance system engineering requires efficient computer-aided dependability evaluation. Although various dependability evaluation techniques and tools have been developed and studied in the last two decades, no adequate attention has been paid to all ...
Cite
Journal ArticleInternational Workshop on Petri Nets and Performance Models · January 1, 1997
The purpose of this paper is to describe a method for simulation of recently introduced fluid stochastic Petri nets. Since such nets result in rather complex set of partial differential equations, numerical solution becomes a formidable task. Because of a ...
Cite
Journal ArticleCOMPASS - Proceedings of the Annual Conference on Computer Assurance · January 1, 1997
Software rejuvenation is a technique for software fault tolerance which involves occasionally stopping the executing software, `cleaning' the `internal state' and restarting. This cleanup is done at desirable times during execution on a preventive basis so ...
Cite
ConferenceDigest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 · January 1, 1997
Although various dependability evaluation techniques and tools have been developed in the last two decades, no adequate attention has been paid to allow system designers not well versed in analytic modeling to easily employ these techniques and tools. In t ...
Full textCite
Journal ArticleProceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS · December 1, 1996
With the increased attention ATM is receiving to meet the needs of a wide variety of applications, tools are needed to help a network designer focus on the design at hand, rather than to spend time exhaustively learning the tools themselves. This is the co ...
Cite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1996
A number of analytical software reliability models have been proposed for estimating the reliability growth of a software product. In this paper we present an Enhanced non-homogeneous Poisson process (ENHPP) model and show that previously reported Non-Homo ...
Cite
Journal ArticleIEEE Transactions on Reliability · December 1, 1996
Two arcs are missing in a figure of Malhotra & Trivedi (1995); these arcs are necessary for the proper functioning of the GSPN. Also, priorities of immediate transitions in that figure must be clearer. This note presents a correctly drawn GSPN and describe ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · December 1, 1996
Stochastic Pétri net models of large systems that are solved by generating the underlying Markov chain pose the problem of largeness of the state-space of the Markov chain. Hierarchical and iterative models of systems have been used extensively to solve th ...
Full textCite
Journal ArticlePerformance Evaluation · October 1, 1996
In this paper we consider the problem of numerical computation of the mean time to failure (MTTF) in Markovian dependability and/or performance models. The problem can be cast as a system of linear equations which is solved using an iterative method preser ...
Full textCite
ConferenceSIGMETRICS 1996 - Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · May 15, 1996
Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce un ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1996
In this paper we consider the problem of numerical computation of the mean time to failure (MTTF) in Markovian dependability and/or performance models. The problem can be cast as a system of linear equations which is solved using an iterative method preser ...
Full textCite
Journal ArticleReliability Engineering and System Safety · January 1, 1996
In this paper, we present a comparative reliability analysis of an application on a corporate B-ISDN network under various alternate-routing protocols. For simple cases, the reliability problem can be cast into fault-tree models and solved rapidly by means ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 1996
Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce un ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1996
In recent studies, the phenomenon of software "aging" has come to light which causes performance of a software to degrade with time. Software rejuvenation is a fault tolerance technique which counteracts aging. In this paper, we address the problem of dete ...
Full textCite
Journal ArticleAmerican Statistician · January 1, 1996
We compare the accuracy of two approximate confidence interval estimators for the Bernoulli parameter p. The approximate confidence intervals are based on the normal and Poisson approximations to the binomial distribution. Charts are given to indicate whic ...
Full textCite
Journal ArticleProceedings - IEEE INFOCOM · January 1, 1996
In this paper we characterize the time-dependent behavior of typical queueing systems that arise in ATM networks under the presense of overloads. Transient queue length distribution and transient cell loss probability are obtained numerically and transient ...
Cite
Journal ArticleProceedings -IEEE International Computer Performance and Dependability Symposium, IPDS · January 1, 1996
SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) is a program that supports the specification and automated solution of reliability and performance models [1]. It contains support for fault trees, reliability block diagrams, r ...
Cite
Journal ArticleIEEE Proceedings of the National Aerospace and Electronics Conference · January 1, 1996
In order to permit users with little analytic background to evaluate dependability, modeling tools require a user-friendly front end. With this motivation, we have developed a software tool referred to as SDDS for 'Software Dependability for Distributed Sy ...
Cite
Journal ArticleInternational Workshop on Petri Nets and Performance Models · December 1, 1995
Stochastic Petri Net models of large systems that are solved by generating the underlying Markov chain pose the problem of largeness of the state-space. Hierarchical and iterative models of systems have been used extensively to solve this problem. A proble ...
Cite
Journal ArticleInternational Workshop on Petri Nets and Performance Models · December 1, 1995
Operating systems which implement a dynamic priority mechanism are very common. Nevertheless, it is very difficult to develop an accurate analytical model to evaluate their performance, mainly due to the different forms of dependency between the various co ...
Cite
Journal ArticleInternational Workshop on Petri Nets and Performance Models · December 1, 1995
In this paper we present and compare two different approaches for the transient solution of Markov regenerative stochastic Petri Nets: the method based on Markov regenerative theory and the method of supplementary variables. In both cases the equations tha ...
Cite
Journal ArticleProceedings of the International Symposium on Software Reliability Engineering, ISSRE · December 1, 1995
In a client-server type system, the server software is required to run continuously for very long periods. Due to repeated and potentially faulty usage by many clients, such software 'ges' with time and eventually fails. Huang et. al. proposed a technique ...
Cite
Journal ArticleInternational Workshop on Petri Nets and Performance Models · December 1, 1995
The recent literature on Markov Regenerative Stochastic Petri Nets (MRSPN) assumes that the random firing time associated to each transition is resampled each time the transition fires or is disabled by the firing of a competitive transition. This modeling ...
Cite
Journal ArticleProceedings - IEEE Military Communications Conference MILCOM · December 1, 1995
Detection and restoration times are often ignored when modeling network reliability. In this paper, we develop Markov Regenerative Reward Models (MRRM) to capture the effects of detection and restoration phases of network recovery. States of the MRRM repre ...
Cite
ConferenceProceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1995/PERFORMANCE 1995 · May 1, 1995
Non-Markovian models allow us to capture a very wide range of circumstances in which it is necessary to model phenomena whose times to occurrence is not exponentially distributed. Events such as timeouts in a protocol, service times at a machine performing ...
Full textCite
Journal ArticleMicroelectronics Reliability · January 1, 1995
Dependability modeling plays a major role in the design, validation and maintenance of real-time computing systems. Typical models provide measures such as mean time to failure, reliability and safety as functions of the component failure rates and fault/e ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1995
Detailed dependability models of various disk array organizations are developed taking into account both the hard disk failures and transient errors. Various error and failure modes of individual disks and the disk array are identified. A small proportion ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1995
This paper compares three numerical methods for reliability calculation of Markov, closed, fault-tolerant systems which give rise to continuous-time, time-homogeneous, finite-state, acyclic Markov chains. We consider a modified version of Jensen's method ( ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1995
This paper describes a methodology to construct dependability models using generalized stochastic Petri nets (GSPN) and stochastic reward nets (SRN). Algorithms are provided to convert a fault tree (a commonly used combinatorial model type) model into equi ...
Full textCite
Journal ArticleProceedings of the Annual Southeast Conference · January 1, 1995
We present a new O(n3) algorithm for seminumerical transient analysis of continuous time Markov chains with n states. The algorithm is based on spectral decomposition of the transition rate matrix in combination with partial fraction expansion based on Lap ...
Full textCite
Journal ArticleIEEE International Conference on Communications · January 1, 1995
The B-ISDN will carry a variety of traffic types: the Variable Bit Rate traffic (VBR), of which compressed video is an example, Continuous Bit Rate traffic (CBR), of which telemetry is an example, Data traffic, and Available Bit Rate traffic (ABR) that rep ...
Cite
Journal ArticleProceedings - International Computer Performance and Dependability Symposium · January 1, 1995
The Markov Regenerative Stochastic Process (MRGP) has been shown to capture the behavior of real systems with both deterministic and exponentially distributed event times. In this paper we survey the MRGP literature and focus on the different solution tech ...
Cite
Journal ArticleProceedings - Annual International Conference on Fault-Tolerant Computing · January 1, 1995
Fault trees and Markov chains are commonly used for dependability modeling. Markov chains are powerful in that various kinds of dependencies can be easily modeled that fault tree models have difficulty capturing, but the state space grows exponentially in ...
Full textCite
ConferenceProceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS · January 1, 1995
In this paper we survey the Petri net literature and focus on Petri nets with generally distributed transition firing times. In the framework of Markov regenerative stochastic Petri nets (MRSPN) we develop and solve two examples to illustrate the modeling ...
Full textCite
Journal ArticleNetworks · January 1, 1995
Several algorithms have been developed to solve the reliability problem for nonseries‐parallel networks using the sum of disjoint products (SDP) approach. This paper provides a general framework for most of these techniques. It reviews methods that help im ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1995
Non-Markovian Stochastic Petri Nets (SPN) have been developed as a tool to deal with systems characterized by non exponentially distributed timed events. Recently, some effort has been devoted to the study of SPN with generally distributed firing times, wh ...
Full textCite
Journal ArticleProceedings - IEEE INFOCOM · December 1, 1994
In this paper we derive expressions for the time-dependent state probabilities and the time-averaged state-probabilities for the leaky bucket rate control scheme. Our model is based on the theory of Markov regenerative processes. Our results specialize to ...
Cite
Journal ArticleTheoretical Computer Science · June 6, 1994
Imperfect coverage and nonnegligible reconfiguration delay are known to have a deleterious effect on the dependability and the performance of a multiprocessor system. In particular, increasing the number of processor elements does not always increase depen ...
Full textCite
Journal ArticleAnnals of Operations Research · April 1, 1994
We consider the numerical computation of response time distributions for closed product form queueing networks using the tagged customer approach. We map this problem on to the computation of the time to absorption distribution of a finite-state continuous ...
Full textCite
ConferenceProceedings of 2nd IEEE Workshop on Real-Time Applications, RTA 1994 · January 1, 1994
Dependability assessment plays an important role in the design and validation of fault-tolerant real-lime computer systems. Dependability models provide measures such as reliability, safety and mean time to failure as functions of the component failure rat ...
Full textCite
Journal ArticleIEEE/ACM Transactions on Networking · January 1, 1994
The inherently weak reliability behavior of the ring architecture has led network designers to consider various design choices to improve network reliability. In this paper, we assess the impact of provisions such as node bypass, secondary ring and concent ...
Full textCite
Journal ArticleProceedings of the IEEE · January 1, 1994
In this paper, we discuss the role of modeling in the design and validation of life-critical, real time systems. The basics of Markov, Markov reward, and stochastic reward net models are covered. An example of a nuclear power plant cooling system is develo ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1994
This paper formally establishes a hierarchy, among the most commonly used types of dependability models, according to their modeling power. Among the combinatorial (non-state-space) model types, we show that fault trees with repeated events are the most po ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 1994
Most reliability analysis techniques and tools assume that a system is used for a mission consisting of a single phase. However, multiple phases are natural in many missions. The failure rates of components, system configuration, and success criteria may v ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1994
The need for the combined performance and reliability analysis of fault tolerant systems is increasing. The common approach to formulating and solving such problems is to use (semi-)Markov reward models. However, the large size of size of state spaces is a ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1994
Stochastic Petri nets of various types (SPN, GSPN, ESPN, DSPN etc.) are recognized as useful modeling tools for analyzing the performance and reliability of systems. The analysis of such Petri nets proceeds by utilizing the underlying continuous-time stoch ...
Full textCite
Journal ArticleMicroelectronics Reliability · January 1, 1994
Three methods for numerical transient analysis of Markov chains, the modified Jensen's method (Jensen's method with steady-state detection of the underlying DTMC and computation of Poisson probabilities using the method of Fox and Glynn [1]), a third-order ...
Full textCite
Journal ArticleProceedings of the IEEE International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems · January 1, 1994
Performability and reliability modeling techniques and tools have been an area of intensive research activity in the last ten years. We present a unified mathematical framework for performability and reliability models in terms of Markov reward models. The ...
Cite
Journal ArticleDigest of Papers - International Symposium on Fault-Tolerant Computing · January 1, 1994
A high fault detection coverage is very critical for systems with ultra-safe requirements and fault injection is an effective technique for estimating the coverage. One difficulty of fault injection lies in the huge number of injections that need to be car ...
Cite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1994
Modelling techniques and tools of the future must meet the challenges presented by today's highly demanding and schedule-oriented developing environment. With the emergence of high performance and reliability systems the problem of how to analyze such syst ...
Full textCite
Journal ArticleDigest of Papers - International Symposium on Fault-Tolerant Computing · December 1, 1993
An analytic model is developed for predicting the performance and reliability of mirrored disk subsystems. The model includes Markovian dependencies in the request stream, read and write traffic, unit failures, and individual request failures necessitating ...
Cite
Journal Article · December 1, 1993
The common approach to formulating and solving combined reliability/availability and performance problems is to use Markov reward models. However, the large size of state spaces is a problem that plagues Markovian models. Combinatorial models have been use ...
Cite
Journal ArticleDiscrete Event Dynamic Systems: Theory and Applications · July 1, 1993
Markov reward models (MRMs) are commonly used for the performance, dependability, and performability analysis of computer and communication systems. Many papers have addressed solution techniques for MRMs. Far less attention has been paid to the specificat ...
Full textCite
Journal ArticleProceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1993 · June 1, 1993
We consider the sensitivity of transient solutions of Markov models to perturbations in their generator matrices. The perturbations can either be of a certain structure or can be very general. We consider two different measures of sensitivity and derive up ...
Full textCite
ConferenceProceedings of 5th International Workshop on Petri Nets and Performance Models, PNPM 1993 · January 1, 1993
Sensitivity analysis, i.e., the analysis of the effect of small variations in system parameters on the output measures, can be studied by computing the derivatives of the output measures with respect to the parameter. An algorithm for parametric sensitivit ...
Full textCite
ConferenceProceedings of 5th International Workshop on Petri Nets and Performance Models, PNPM 1993 · January 1, 1993
A methodology for formal specification of hierarchy both in model specification and model solution is presented. Hierarchy is allowed to exist among different model types used in performance and dependability modeling. This offers a lot of flexibility and ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1993
The objective of this paper is to describe a technique for computing the distribution of the completion time of a program on a server subject to failure and repair. Several realistic aspects of the system are included in the model. The server behavior is m ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · January 1, 1993
This paper considers the problem of accurately modeling the software fault-tolerance technique based on recovery blocks. Models of such systems have been criticized for their assumptions of independence. Analysis of some systems have considered the correla ...
Full textCite
Journal ArticleIEEE Transactions on Education · January 1, 1993
The study of stochastic modeling can be greatly enriched by the use of computer software. Such software should enable students to experiment with modeling techniques, check their understanding of algorithms for model analysis, develop the skills and “intui ...
Full textCite
Journal ArticleProceedings - IEEE INFOCOM · January 1, 1993
Five different attachment schemes proposed for the FDDI token ring are compared in terms of reliability. For this purpose, the topologies are first studied in isolation (reliability of the path to the backbone) and subsequently end-to-end user reliabilitie ...
Cite
Journal ArticleIEEE Transactions on Reliability · January 1, 1993
Performability models of multiprocessor systems and their evaluation are presented. Two cases in which hierarchical modeling is applied are examined. 1. Models are developed to analyze the behavior of processor arrays of various sizes in the presence of pe ...
Full textCite
Journal ArticleIEEE Transactions on Parallel and Distributed Systems · January 1, 1993
A client-server system is a distributed system where a server station receives requests from its client stations, processes the requests and returns replies to the requesting stations. In this paper, client-server systems in which a set of workstations acc ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1993
We present a decomposition approach for the solution of large stochastic reward nets (SRNs) based on the concept of near-independence. The overall model consists of a set of submodels whose interactions are described by an import graph. Each node of the gr ...
Full textCite
Journal ArticleJournal of Parallel and Distributed Computing · January 1, 1993
A reliability analysis of various disk array architectures (different levels of RAID) is performed. The dependence of reliability and mean time to data loss on various parameters of a disk array is characterized. A study of these characteristics reveals th ...
Full textCite
Journal ArticleProceedings - International Conference on Distributed Computing Systems · January 1, 1993
We present a performance analysis of a heterogeneous multiprocessor system where tasks may arrive from Poisson sources as well as by spawning and probabilistic branching of other tasks. Non-preemptive priority scheduling is used between different tasks. We ...
Cite
Journal ArticleProceedings of the Annual Reliability and Maintainability Symposium · January 1, 1993
Mean time to failure (MTTF) is one of the most frequently used dependability measures in practice. MTTF is the expected time for a system to reach the predefined failure states due to any of the failure causes. If system failures are classified into differ ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993
Deterministic and stochastic Petri nets (DSPNs) are recognized as a useful modeling technique because of their capability to represent constant delays which appear in many practical systems. If at most one deterministic transition is allowed to be enabled ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993
This paper presents a procedure of transforming an Estelle specification into Stochastic Reward Net (SRN) formalism. Estelle is an ISO standard formal specification language which can help avoid ambiguity, incompleteness and inconsistency in system develop ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993
In this paper we introduce a new class of stochastic Petri nets in which one or more places can hold fluid rather than discrete tokens. After defining the class of fluid stochastic Petri nets, we provide equations for their transient and steady-state behav ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1993
In this tutorial, we discuss several practical issues regarding specification and solution of dependability and performability models. We compare model types with and without rewards. Continuous-time Markov chains (CTMCs) are compared with (continuous-time ...
Full textCite
Journal ArticleProceedings - IEEE INFOCOM · December 1, 1992
The performance of a polling system is modeled by stochastic Petri nets and its analysis is done by numerically solving the underlying Markov chain. One key problem in using stochastic Petri nets for real applications is that the size of underlying Markov ...
Full textCite
Journal ArticleJournal of Parallel and Distributed Computing · January 1, 1992
We present two software applications and develop models for them. The first application considers a producer-consumer tasking system with an intermediate buffer task and studies how the performance is affected by different selection policies when multiple ...
Full textCite
Journal ArticleMicroelectronics Reliability · January 1, 1992
We discuss unified performance and reliability analysis of a system which operates in a critical environment, in the sense that a catastrophic condition is reached when the accumulated down time exceeds a given threshold. Assuming that the system must proc ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1992
Composite performance and dependability analysis is gaining importance in the design of complex, fault-tolerant systems. Markov reward models are most commonly used for this purpose. In this paper, an introduction to Markov reward models including solution ...
Full textCite
Journal ArticleQueueing Systems · December 1, 1991
We consider queueing systems in which the server occasionally takes a vacation of random duration. The vacation can be used to do additional work; it can also be a rest period. Several models of this problem have been analyzed in the past assuming that the ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1991
We extend the basic GSPN (generalized stochastic Petri net) model to the GSPN-reward model. This allows the concise specification of both the underlying stochastic process and the rewards attached to the states and the transitions of the stochastic process ...
Full textCite
Journal ArticleProceedings of the Annual Reliability and Maintainability Symposium · January 1, 1991
A computer system dependability analysis that ties together concepts such as reliability, maintainability and availability is discussed. Three classes of dependability measures are described: system availability, system reliability, and task completion. Us ...
Cite
ConferenceProceedings of the 4th International Workshop on Petri Nets and Performance Models, PNPM 1991 · January 1, 1991
Analytical reliability modeling is a promising method for predicting the reliability of different architectural variants and to perform trade-off studies at design time. However, generating a computationally tractable analytic model implies in general an a ...
Full textCite
ConferenceProceedings of the 4th International Workshop on Petri Nets and Performance Models, PNPM 1991 · January 1, 1991
We present a decomposition approach for the solution of large stochastic Petri nets (SPNs). The overall model consists of a set of submodels whose interactions are described by an import graph. Each node of the graph corresponds to a parametrized SPN submo ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1991
The purpose of this paper is to describe an efficient Boolean algebraic algorithm to compute the probability of a union of non-disjoint sets as applied to symbolic reliability analysis. Coherent networks and fault-trees with statistically-independent compo ...
Full textCite
ConferenceProceedings - Conference on Local Computer Networks, LCN · January 1, 1991
In this paper we develop reliability models and derive closed-form results for network reliability and network mean time to failure, including both node and link failures, for a very popular high speed LAN, the FDDI (Fiber Distributed Data Interface). We t ...
Full textCite
Journal ArticleProceedings - Symposium on Reliability in Distributed Software and Database Systems · December 1, 1990
An efficient Boolean algebraic algorithm for the symbolic reliability and sensitivity analysis of coherent two-terminal networks with s independent components is described. The algorithm is also applicable to a fault tree model without NOT gates. The algor ...
Cite
Journal ArticleQueueing Systems · September 1, 1990
We consider a queueing system with two stations served by a single server in a cyclic manner. We assume that at most one customer can be served at a station when the server arrives at the station. The system is subject to service interuption that arises fr ...
Full textCite
ConferenceProceedings - 28th Annual Southeast Regional Conference, ACM-SE 1990 · April 1, 1990
Sensitivity analysis of continuous time Markov chains has been considered recently by several researchers. This is very useful in performing bottleneck analysis and optimization on systems especially during the design stage. However the construction of the ...
Cite
Journal ArticleCommunications in Statistics. Stochastic Models · January 1, 1990
In this paper we derive the distribution of the completion time of a job with a PH-distributed work requirement, on a server modeled by a homogeneous Markov reward process. The interactions between the job in progress and the server are allowed to be eithe ...
Full textCite
Journal ArticleCommunications in Statistics. Stochastic Models · January 1, 1990
Checkpointing is a technique for reducing the completion (execution) time of long-running batch programs in the presence of failures. It consists of intermittently saving the current status of the program under execution so that if a failure occurs, the pr ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1990
We present an aggregation method for the computation of transient cumulative measures of large, stiff Markov models. The method is based on the classification of the states of the original problem into slow, fast transient, and fast recurrent states. We ag ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1990
With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to considerable interest ...
Full textCite
Journal ArticleIEEE Journal on Selected Areas in Communications · January 1, 1990
We consider finite-population and finite-capacity polling systems. The behavior of these systems is described by means of generalized stochastic Petri nets. The exact results for the mean response times are obtained numerically by means of the stochastic P ...
Full textCite
Journal ArticlePerformance Evaluation · January 1, 1990
Workload characterization is known to be a difficult and yet a very important facet of performance modeling. User behavior graphs have been advocated as a practical means of workload characterization. Performance modeling with user behavior graphs is for t ...
Full textCite
Journal ArticleProceedings of the Hawaii International Conference on System Science · January 1, 1990
A model is developed demonstrating that when availability is the only measure of system effectiveness of interest, even a small reconfiguration delay leads to a violation of the monotonic increase in availability with the number of processors. A measure of ...
Cite
Journal ArticleAdvances in Computers · January 1, 1990
Dependability calculates the capability of a product to deliver its intended level of service to the user, especially in light of failures or other incidents that impinge on its performance, and combines various underlying ideas, such as reliability, maint ...
Full textCite
Journal Article · December 1, 1989
A VAXcluster is a closely coupled multicomputer system that consists of two or more VAX computers, one or more hierarchical storage controllers (HSCs), two or more disks, and a star coupler. The Markov model of VAX cluster system availability suffers from ...
Cite
Journal ArticleProceedings of the International Conference on Parallel Processing · December 1, 1989
The authors address the problem of the completion time of a job structured as a directed acyclic graph processed on parallel processors, i.e., programs consisting of precedence-constrained tasks. The processors are allowed to be subject to failure and repa ...
Cite
Journal ArticleProceedings - Real-Time Systems Symposium · December 1, 1989
A novel technique that allows a single system to guarantee the execution of both periodic and aperiodic tasks within hard deadlines is presented. The approach is based on dynamically changing the replication factor of periodic tasks in response to aperiodi ...
Cite
Journal Article · December 1, 1989
SPNP, a powerful GSPN package that allows the modeling of complex system behaviors, is presented. Advanced constructs are available in SPNP such as marking-dependent arc multiplicities, enabling functions, arrays of places or transitions, and subnets; the ...
Cite
Journal ArticleProceedings - International Conference on Distributed Computing Systems · June 1, 1989
The problem of predicting the reliability of a distributed system based on the principles of Byzantine agreement is addressed. The system is considered inoperable or failed if Byzantine agreement cannot be guaranteed. The reliability models depend on a uni ...
Cite
Journal ArticleEuropean Journal of Operational Research · May 25, 1989
The advent of fault-tolerant, distributed systems has led to increased interest in analytic techniques for the prediction of reliability, availability, and combined performance and reliability measures. Markov and Markov reward models are common tools for ...
Full textCite
Journal ArticleORSA Journal on Computing · May 1989
Continuous-time Markov chains (CTMC) are widely used mathematical models. Reliability models, queueing networks, and inventory models all require transient solutions of CTMC. The cost of CTMC transient solution increases with size, stiffness, and ...
Full textCite
Journal ArticleCommunications in Statistics. Stochastic Models · January 1, 1989
Markov chains and Markov reward models provide are useful for modeling fault-tolerant, distributed and multi-processor systems. In this paper, we consider the transient analysis of “cumulative” or “integral” measures of Markov and Markov reward model behav ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1989
Several different models for predicting coverage in a faulttolerant system are discussed, including models for permanent, intermittent, and transient errors. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1989
In this paper, we examine the reliability of an unique-path multistage interconnection network (MIN) and a faulttolerant scheme aimed at improving system reliability. We derive closed-form expressions for the time-dependent reliability of the 8×8 and 16 × ...
Full textCite
Journal ArticleIBM Journal of Research and Development · January 1, 1989
Local area networks have been developed using both ring and bus topologies. Multi-loop and multi-connected topologies have been proposed to improve the throughput and dependability of single-loop networks. We evaluate the dependability of a class multi-con ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1989
We solve for the availability of an n-processor VAXcluster system using a hierarchical approach that allows us to: 1) obtain a closed-form answer to an apparently difficult problem, and 2) determine the optimal number of processors in the cluster for a giv ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1989
Based on the nature of the upper-and lower-bound block diagram models of Multistage Interconnection Networks (MINs), we generalize and consider a series system consisting of independent subsystems. In order to model the reliability of such a system with On ...
Full textCite
Journal ArticleJournal of Guidance, Control, and Dynamics · January 1, 1989
The reliability of digital flight control systems can often be accurately predicted! using Markov chain models. We begin our discussion of flight control system reliability models with definitions of key terms. We then construct a single-fault one-processo ...
Full textCite
Journal Article · December 1, 1988
The Hybrid Automated Reliability Predictor (HARP) is a software package that implements advanced reliability modeling techniques. We present an overview of some of the problems that arise in modeling highly reliable, fault tolerant systems, loosely divided ...
Cite
Journal ArticleDigest of Papers - FTCS (Fault-Tolerant Computing Symposium) · December 1, 1988
The authors examine the augmented-shuffle-exchange network (ASEN), a network with low switch and link complexity. Using exact reliability expressions for small networks and upper and lower bounds for larger networks, the reliability of the ASEN is compared ...
Cite
Journal ArticlePerform. Eval. Rev. (USA) · 1988
Traditional evaluation techniques for multiprocessor systems use Markov chains and Markov reward models to compute measures such as mean time to failure, reliability, performance, and performability. In this paper, the authors discuss the extension of Mark ...
Cite
Journal ArticleComputers and Operations Research · January 1, 1988
We consider the numerical evaluation of Markov model transient behavior. Our research is motivated primarily by computer system dependability modeling. Other application areas include finitecapacity queueing models, closed queueing networks and inventory m ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1988
Multiprocessor systems can provide higher performance and higher reliability/availability than single-processor systems. In order to properly assess the effectiveness of multiprocessor systems, measures that combine performance and reliability are needed. ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1988
This paper describes a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Mo ...
Full textCite
Journal ArticleProceedings of the Hawaii International Conference on System Science · January 1, 1988
The authors consider the reliability of the shuffle-exchange multistage interconnection network (SEN) and two variations of this network aimed at improving reliability through fault tolerance. The two variations are the SEN with an extra stage and the redu ...
Full textCite
Journal Article · December 1, 1987
The Hybrid Automated Reliability Predictor (HARP) is a software package that implements advanced reliability modling techniques. In this paper we present an overview of some of the problems that arise in modeling highly reliable, fault tolerant systems, lo ...
Cite
Journal ArticleAnnals of Operations Research · December 1, 1987
System availability is becoming an increasingly important factor in evaluating the behavior of commercial computer systems. This is due to the increased dependence of enterprises on continuously operating computer systems and to the emphasis on fault-toler ...
Full textCite
Journal ArticleAdvances in Applied Probability · December 1987
In this paper we present a general model of the completion time of a single job on a computer system whose state changes according to a semi-Markov process with possibly infinite state-space. When the state of the system changes the job service is ...
Full textCite
Journal ArticleSadhana · October 1, 1987
We present an overview of the major problems inherent in reliability modelling of fault-tolerant systems. The problems faced while modelling such systems include the need to consider a very large state space, non-exponential distributions, error analysis, ...
Full textCite
Journal ArticleInformation Processing Letters · April 6, 1987
We study the stability condition of an M/G/1 priority queue with two classes of jobs. Class 1 jobs have preemptive priority over class 2 jobs. We consider three different types of preemptions and the effects of possible work loss (due to preemption) on the ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · January 1, 1987
In this paper we consider the queueing analysis of a faulttolerant computer system. The failure/repair behavior of the server is modeled by an irreducible continuous-time Markov chain. Jobs arrive in a Poisson fashion to the system and are serviced accordi ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1987
Conclusions-Combinatorial models such as fault trees and reliability block diagrams are efficient for model specification and often efficient in their evaluation. But it is difficult, if not impossible, to allow for dependencies (such as repair dependency ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1987
Conclusions-HARP (the Hybrid Automated Reliability Predictor) is a software package that implements acvanced reliability modeling techniques. We present an overview of some of the problems that arise in modeling highly reliable fault-tolerant systems; the ...
Full textCite
Journal ArticleDigest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1987
The authors describe the behavior of a multiprocessor system as a continuous-time Markov chain and associate a reward rate (performance measure) with each state. They evaluate the distribution of performability for analytical models of two multiprocessor s ...
Cite
Journal ArticlePerformance Evaluation · January 1, 1987
Continuous-time Markov chains are commonly used insystem reliability modeling. In this paper, we discuss a method for automatically deriving transient solutions that are symbolic in t for acyclic Markov chains. Our method also includes parametric sensitivi ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1987
Conclusions-Reliability is the probability that a system functions according to specifications over a given period of time. During this period, system specifications may allow failures and repairs to occur. This paper considers systems with specifications ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · January 1, 1987
A graph-based modeling technique has been developed for the stochastic analysis of systems containing concurrency. The basis of the technique is the use of directed acyclic graphs. These graphs represent event-precedence networks where activities may occur ...
Full textCite
Journal Article · December 1, 1986
Combinatorial models such as fault-trees and reliability block diagrams are efficient in both specification and evaluation of system models, but it is difficult if not impossible to allow for various types of dependency, transient and intermittent faults, ...
Cite
Journal ArticleActa Informatica · November 1, 1986
In order to aid the designers of life-critical, fault-tolerant computing systems, accurate and efficient methods for reliability prediction are needed. The accuracy requirement implies the need to model the system in great detail, and hence the need to add ...
Full textCite
Journal ArticleJournal of Guidance, Control, and Dynamics · January 1, 1986
In this paper, we present an overview of the hybrid automated reliability predictor (HARP), under development at Duke and Clemson Universities. The HARP approach to reliability prediction is characterized by a decomposition of the overall model into distin ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1986
An approximation algorithm for systematically converting a stiff Markov chain into a nonstiff chain with a smaller state space is described. After classifying the set of all states into fast and slow states, the algorithm proceeds by further classifying fa ...
Full textCite
Journal ArticleThe Journal of Systems and Software · January 1, 1986
We present an effective technique for the combined performance and reliability analysis of multimode computer systems. A reward rate (or a performance level) is associated with each mode of operation. The switching between different modes is characterized ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 1986
Queueing models provide a useful tool for predicting the performance of many service systems including computer systems, telecommunication systems, computer/communication networks and flexible manufacturing systems. Traditional queueing models predict syst ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1986
Provably conservative (and optimistic) reliability models can be systematically derived from more complex models. These derived models incorporate a reduced state space and fewer transitions and, therefore, have solutions that are more cost-effective than ...
Full textCite
Journal ArticleIFAC Proceedings Series · January 1, 1986
Dependability measures (such as reliability, mean time to failure, availability) are important criteria for the design of computer-based applications, as well as for their validation. In this paper important techniques for dependability modeling are discus ...
Full textCite
Journal ArticleDigest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1986
Fault-tolerant computer systems change their level of performance (e. g, mode of operation or service rate) in response to different events such as failure, degradation or repair. The authors present a unified model for the analysis of job (task) completio ...
Cite
Journal ArticleDigest of Papers - FTCS (Fault-Tolerant Computing Symposium) · January 1, 1986
The system availability estimator (SAVE) program package is described that can be used for constructing and solving probabilistic models of computer system availability and reliability. SAVE is a state-of-the-art tool intended for use during system design ...
Cite
Conference12th International Computer Measurement Group Conference, CMG 1986 · January 1, 1986
The vser behavior graph is a graphical model for describing the behavior of the interactive users. The adequacy of user behavior graphs in several performance evaluation studies for workload characterization is investigated. In absence of memory constraint ...
Cite
Journal Article · December 1, 1985
A description is given of the philosophical differences between three current SPN models in an attempt to merge the most important (and noncontradictory) aspects into one. This work previews the design of a package for the solution of this unified model. ...
Cite
Journal Article · December 1, 1985
An Extended Stochastic Petri Net (ESPN) model, useful for modeling systems which exhibit concurrent, asynchronous, or nondeterministic behavior is developed. Applications demonstrating the flexibility of the model for a variety of system modeling applicati ...
Cite
Journal ArticleIEEE Transactions on Computers · January 1, 1985
In order to remain tractable, many reliabilitymod-els do not include the states-and transitions necessary to represent fault/error-handling details. Instead, the effectiveness of fault/ error-handling mechanisms is represented by the use ofinstanta-neous c ...
Full textCite
Journal ArticleOperations Research Letters · January 1, 1985
We consider a single server first in first out queue in which each arriving task has to be completed within a certain period of time (its deadline). More precisely, each arriving task has its own deadline - a non-negative real number - and as soon as the r ...
Full textCite
Journal ArticleModeling and Simulation, Proceedings of the Annual Pittsburgh Conference · December 1, 1984
We detail the use of behavioral decomposition in modeling systems that contain state transitions having widely disparate time constants. We show that such decomposition leads naturally to hybrid system models, containing both analytic and simulative submod ...
Cite
Journal Article · December 1, 1984
Important problems that arise in modeling highly-reliable fault-tolerant systems are discussed. First, reliability models of such systems possess a large number of states, making the solution computationally intractable. This leads to the need for decompos ...
Cite
Journal ArticleModeling and Simulation, Proceedings of the Annual Pittsburgh Conference · December 1, 1984
The need for increased reliability and computing power coupled with advances in technology has given rise to sizeable and complex computer systems. The users and designers of such systems need tools to evaluate the effectiveness of such systems. Current ap ...
Cite
Journal ArticleJournal of Systems and Software · May 1, 1984
We present an effective technique for the combined performance and reliability analysis of multimode computer systems. A reward rate (or a performance level) is associated with each mode of operation. The switching between different modes is characterized ...
Cite
Journal ArticleBehaviour and Information Technology · January 1, 1984
Ergonomics in India is a newly emerging discipline-having made inroads to the people of India very recently. Most of the Indians are absolutely unaware of using ergonomics to achieve an efficient man-machine-environment system for better productivity with ...
Full textCite
Journal ArticleComputers and Electrical Engineering · January 1, 1984
Current technology allows sufficient redundancy in fault-tolerant computer systems to insure that the failure probability due to exhaustion of spares is low. Consequently, the major cause of failure is the inability to correctly detect, isolate, and reconf ...
Full textCite
Journal ArticleActa Informatica · September 1, 1983
This paper examines task allocation in fault-tolerant distributed systems. The problem is formulated as a constrained sum of squares minimization problem. The computational complexity of this problem prompts us to consider an efficient approximation algori ...
Full textCite
Journal ArticleIEEE Transactions on Reliability · January 1, 1983
Summary & Conclusions:—Two important problems which arise in modeling fault-tolerant systems with ultra-high reliability requirements are discussed. 1) Any analytic model of such a system has a large number of states, making the solution computationally in ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1983
A review and a critical evaluation of a representative class of state-of-the-art models for ultrahigh reliability prediction is presented. This evaluation naturally leads us to a new model for ultrahigh reliability prediction now under development. The new ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1983
Analytic queueing models of programs with internal concurrency are considered. The program behavior model allows a process to spawn two or more concurrent tasks at some point during its execution. Except for queueing effects, the tasks execute independentl ...
Full textCite
Journal ArticleProceedings - Symposium on Reliability in Distributed Software and Database Systems · December 1, 1982
Task and file allocation are examined in two classes of fault-tolerant distributed systems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems, while the file allocation problem arises in Ethernet-like systems. B ...
Cite
Journal ArticleIEEE Transactions on Computers · January 1, 1982
An optimization model is developed for assigning a fixed set of files across an assemblage of storage devices so as to maximize system throughput. Multiple levels of executable memories and distinct record sizes for separate files are allowed. Through the ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1982
Computer performance models of parallel processing systems in which a job subdivides into two or more tasks at some point during its execution are considered. Except for queueing effects, the tasks execute independently of one another and do not require sy ...
Full textCite
Journal ArticleComputer performance · January 1, 1982
A computer configuration design problem is considered. The computer system is modelled as a closed queueing network. The average response time to an interactive user request is minimized and the speeds of the devices are the decision variables. A broad cla ...
Cite
Conference8th International Computer Measurement Group Conference, CMG 1982 · January 1, 1982
This paper considers a computer configuration design problem. The computer is modeled as a closed queueinq network. The average resDonse time to an interactive user request is to be minimized. The decision variables are CPU speed, capacities of I/O devices ...
Cite
Journal ArticleNASA Contractor Reports · December 1, 1981
CARE III is a major departure from conventional approaches to reliability modeling in that it purports to support nonexponential distributions, while avoiding the problem of large state spaces through the use of state aggregation. More specifically, CARE I ...
Cite
Journal ArticleJournal of the ACM (JACM) · April 1, 1981
The performance-oriented design of linear storage hierarchies whtch are operating m muluprogramming environments is considered An optimization model is superimposed upon an exponential queuing network model of the hierarchy, yielding a problem whose object ...
Full textCite
Journal ArticleNASA Contractor Reports · January 1, 1981
CARE III is a major departure from conventional approaches to reliability modeling in that it purports to support nonexponential distributions, while avoiding the problem of large state spaces through the use of state aggregation. More specifically, CARE I ...
Cite
Conference7th International Computer Measurement Group Conference, CMG 1981 · January 1, 1981
This paper considers a computer configuration design problem. The computer is modeled as a closed gueueing network. The average response time to an interactive user request is to be minimized. The decision variables are CPU speed, capacities of I/O devices ...
Cite
Journal ArticleJournal of the ACM (JACM) · July 1, 1980
This paper presents a computer system configuration design problem in which the objective is to select the CPU speed, the capacities of secondary storage devices, and the allocation of a set of files across the secondary storage devices so as to maximize t ...
Full textCite
ConferenceProceedings of the 1980 International Symposium on Computer Performance Modelling, Measurement and Evaluation, PERFORMANCE 1980 · May 28, 1980
This paper extends a previous model for computer system configuration planning developed by the authors. The problem is to optimally select the CPU speed, the device capacities, and file assignments so as to maximize throughput subject to a fixed cost cons ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · May 6, 1980
A geometric programming model is proposed to determine the optimal design of the CPU and its matching storage hierarchy. The objective function is the maximization of system reliability subject to performance and budgetary limitations. Examples illustratin ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 1980
A previous model for computer system configuration planning is extended. The problem is to optimally select the CPU speed, the device capacities, and file assignments so as to maximize throughput subject to a fixed cost constraint. This essentially discret ...
Full textCite
Journal ArticleEASCON Record: IEEE Electronics and Aerospace Systems Convention · January 1, 1980
Design requirements of systems used in life-critical applications result in the specification of extremely high levels of reliability. A case in point is the digital flight control system to be used in the future generation of aircraft. Traditional reliabi ...
Cite
Journal ArticleNational Bureau of Standards, Special Publication · January 1, 1980
This work extends a previous model for computer system configuration planning developed by the authors. The problem is to optimally select CPU speed, device capacities, and file assignments so as to maximize system throughput subject to a fixed cost constr ...
Cite
Journal ArticleComputing · September 1, 1979
Prepaging is advocated as a technique to reduce the excessive page traffic due to the changes in the phases of execution of a program. Common prepaging techniques are surveyed. It is advocated that the phase transition behavior cannot be adequately predict ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · April 23, 1979
In this paper, a comparison of the performance of optimally designed computer systems with and without virtual memory is made. The computer systems in question are modeled by closed queuing networks of the central server type. The design of the systems is ...
Full textCite
Journal ArticleIEEE Transactions on Software Engineering · January 1, 1979
This paper considers a computer configuration design problem. The computer system is modeled by a closed queuing network. The system throughput is the objective function to be maximized and the speed of the devices are the decision variables. A rich class ...
Full textCite
Journal ArticleTransactions of the American Association of Cost Engineers · January 1, 1978
This paper determines the speed of the devices constituting a computer system which will maximize system throughput given a fixed budget. The system composed of a central processor and peripheral devices is modeled as a closed queueing network of exponenti ...
Cite
Journal Article · January 1, 1978
A case study in the use of closed queuing network models for the design and analysis of an interactive distributed computer system as applicable to a military computer system as applicable to a military command and control system is presented. Such a proje ...
Cite
Journal Article · January 1, 1978
A formal proof of correctness of the on-line division algorithm specified in an earlier paper is presented. Two radix 4 on-line division algorithms, with non-redundant and redundant operands respectively, are also derived. ...
Cite
Journal ArticleHigh Speed Computer and Algorithm Organization · 1977
The Control Data Corporation (CDC) STAR (STring ARray) computer is a high performance vector machine capable of performing up to 100 million operations per second. Although the size of the main memory is limited to either 1/2 million or 1 million 64-bit wo ...
Cite
Journal ArticleIEEE Transactions on Computers · January 1, 1977
Data paging is of primary concern for problems with large data bases and for many types of array problems. We show that prepaging reduces the paging problems of array algorithms operating on large arrays. We also show that the use of a submatrix algorithm ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1977
Recently, there has been some interest in the use of continued fractions for digital hardware calculations. We require that the coefficients of the continued fractions be integral powers of 2 and, therefore, well-known continued fraction expansions of func ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1977
In this paper, on-line algorithms for division and multiplication are developed. It is assumed that the operands as well as the result flow through the arithmetic unit in a digit-by-digit, most significant digit first fashion. The use of a redundant digit ...
Full textCite
Journal ArticleIEEE Transactions on Computers · 1977
Data paging is of primary concern for problems with large data bases and for many types of array problems. It is shown that prepaging reduces the paging problems of array algorithms operating on large arrays. Also shown is that the use of a submatrix algor ...
Cite
Journal ArticleIEEE Transactions on Computers · January 1, 1976
A demand prepaging algorithm DPMIN is defined and proved to be an optimal demand prepaging algorithm. However, it cannot be used in practice since it requires that the future reference string be completely known in advance. Several practical prepaging algo ...
Full textCite
Journal ArticleProceedings - Symposium on Computer Arithmetic · January 1, 1975
Recently, there has been some interest in the use of continued fractions for digital hardware calculat ions. We require t h a t the coefficients of the continued fractions be integral powers of two. As a result well known continued fraction expansions of f ...
Full textCite
Journal ArticleIEEE Transactions on Computers · January 1, 1973
The purpose of this paper is to demonstrate that represen tations of numbers other than positional notation may lead to practical hardware realizations for digital calculation of classes of algorithms. This paper describes current research in the use of co ...
Full textCite
ConferenceProceedings - Symposium on Computer Arithmetic · January 1, 1972
The purpose of this paper is to demonstrate that representations of numbers other than positional notation may lead to practical hardware realizations for the digital calculation of classes of algorithms. It is the authors' opinion that practicality of the ...
Full textCite