ConferenceProceedings - International Symposium on Computer Architecture · January 1, 2024
We revisit the question of how many virtual networks (VNs) are required to provably avoid deadlock in a cache coherence protocol. The textbook way of reasoning about VNs says that the number of VNs depends on the longest chain of message dependencies in th ...
Full textCite
ConferenceProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 · October 28, 2023
Experiments with computer processors must account for the inherent variability in executions. Prior work has shown that real systems exhibit variability, and random effects must be injected into simulators to account for it. Thus, we can run multiple execu ...
Full textCite
Journal ArticleIEEE Micro · July 1, 2023
We address the two challenges architects face when designing heterogeneous processors with cache-coherent shared memory. First, we introduce HeteroGen, an automated tool for composing clusters of cores, each with its own coherence protocol. Second, we show ...
Full textCite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · January 1, 2022
We solve the two challenges architects face when designing heterogeneous processors with cache coherent shared memory. First, we develop an automated tool, called HeteroGen, for composing clusters of cores, each with its own coherence protocol. Second, we ...
Full textCite
ConferenceProceedings - 2022 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2022 · January 1, 2022
The deployment of increasingly large and capable FPGAs has motivated mechanisms for sharing them, but system support for FPGAs is not yet mature. Traditional scheduling algorithms do not account for the unique characteristics of FPGAs, leading to infeasibl ...
Full textCite
Journal ArticleComputer · March 1, 2021
Computer architects want to design processors that are general purpose yet have the performance of special-purpose hardware tailored to each application. Recently, this goal has led to a proliferation of hardware accelerators for important tasks, including ...
Full textCite
ConferenceProceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021 · March 1, 2021
Achieving peak performance on sparse operations is challenging. The distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Given the diversity in workloads and architectures, no unique solution always wins. ...
Full textCite
Journal ArticleACM Transactions on Architecture and Code Optimization · January 1, 2021
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabric ...
Full textCite
ConferenceIEEE International Conference on Intelligent Robots and Systems · October 24, 2020
Precomputed roadmaps can enable effective multi-query motion planning: a roadmap can be built for a robot as if no obstacles were present, and then after edges invalidated by obstacles observed at query time are deleted, path search through the remaining r ...
Full textCite
ConferenceProceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020 · June 1, 2020
Racetrack memory is a promising new non-volatile memory technology, especially because of the density of its 3D implementation. However, for 3D racetrack to reach its potential, certain reliability issues must be overcome. Prior work used per-track encodin ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · May 1, 2020
We present HieraGen, a new tool for automatically generating hierarchical cache coherence protocols. HieraGen's inputs are the simple, atomic, stable state protocols for each level of the hierarchy. HieraGen's output is a highly concurrent hierarchical pro ...
Full textCite
ConferenceProceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020 · March 1, 2020
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabric ...
Full textCite
Journal ArticleSynthesis Lectures on Computer Architecture · January 1, 2020
Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, ...
Full textCite
ConferenceProceedings of the International Conference on Application-Specific Systems, Architectures and Processors · July 1, 2019
We have designed a programmable architecture to accelerate collision detection and graph search, two of the principal components of robotic motion planning. The programmability enables the architecture to be applied to a wide range of different robots and ...
Full textCite
ConferenceProceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019 · June 1, 2019
Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error ...
Full textCite
Journal ArticleIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · November 1, 2018
We extend the lifetime of Flash memory in embedded processors by exploiting the fact that data from sensors is inherently analog. Prior work in the computer architecture community has assumed that all data is digital and has overlooked the opportunities av ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · July 19, 2018
Designing directory cache coherence protocols is complicated because coherence transactions are not atomic in modern multicore processors. A coherence transaction comprises multiple messages, and these messages can interleave with other conflicting coheren ...
Full textCite
Journal ArticleComputer · March 1, 2018
This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Computer Architecture Letters. ...
Full textCite
ConferenceProceedings - 35th IEEE International Conference on Computer Design, ICCD 2017 · November 22, 2017
In this paper, we introduce Jenga, a new scheme for protecting 3D DRAM, specifically high bandwidth memory (HBM), from failures in bits, rows, banks, channels, dies, and TSVs. By providing redundancy at the granularity of a cache block rather than across b ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · January 1, 2024
We revisit the question of how many virtual networks (VNs) are required to provably avoid deadlock in a cache coherence protocol. The textbook way of reasoning about VNs says that the number of VNs depends on the longest chain of message dependencies in th ...
Full textCite
ConferenceProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 · October 28, 2023
Experiments with computer processors must account for the inherent variability in executions. Prior work has shown that real systems exhibit variability, and random effects must be injected into simulators to account for it. Thus, we can run multiple execu ...
Full textCite
Journal ArticleIEEE Micro · July 1, 2023
We address the two challenges architects face when designing heterogeneous processors with cache-coherent shared memory. First, we introduce HeteroGen, an automated tool for composing clusters of cores, each with its own coherence protocol. Second, we show ...
Full textCite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · January 1, 2022
We solve the two challenges architects face when designing heterogeneous processors with cache coherent shared memory. First, we develop an automated tool, called HeteroGen, for composing clusters of cores, each with its own coherence protocol. Second, we ...
Full textCite
ConferenceProceedings - 2022 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2022 · January 1, 2022
The deployment of increasingly large and capable FPGAs has motivated mechanisms for sharing them, but system support for FPGAs is not yet mature. Traditional scheduling algorithms do not account for the unique characteristics of FPGAs, leading to infeasibl ...
Full textCite
Journal ArticleComputer · March 1, 2021
Computer architects want to design processors that are general purpose yet have the performance of special-purpose hardware tailored to each application. Recently, this goal has led to a proliferation of hardware accelerators for important tasks, including ...
Full textCite
ConferenceProceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021 · March 1, 2021
Achieving peak performance on sparse operations is challenging. The distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Given the diversity in workloads and architectures, no unique solution always wins. ...
Full textCite
Journal ArticleACM Transactions on Architecture and Code Optimization · January 1, 2021
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabric ...
Full textCite
ConferenceIEEE International Conference on Intelligent Robots and Systems · October 24, 2020
Precomputed roadmaps can enable effective multi-query motion planning: a roadmap can be built for a robot as if no obstacles were present, and then after edges invalidated by obstacles observed at query time are deleted, path search through the remaining r ...
Full textCite
ConferenceProceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020 · June 1, 2020
Racetrack memory is a promising new non-volatile memory technology, especially because of the density of its 3D implementation. However, for 3D racetrack to reach its potential, certain reliability issues must be overcome. Prior work used per-track encodin ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · May 1, 2020
We present HieraGen, a new tool for automatically generating hierarchical cache coherence protocols. HieraGen's inputs are the simple, atomic, stable state protocols for each level of the hierarchy. HieraGen's output is a highly concurrent hierarchical pro ...
Full textCite
ConferenceProceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020 · March 1, 2020
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabric ...
Full textCite
Journal ArticleSynthesis Lectures on Computer Architecture · January 1, 2020
Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, ...
Full textCite
ConferenceProceedings of the International Conference on Application-Specific Systems, Architectures and Processors · July 1, 2019
We have designed a programmable architecture to accelerate collision detection and graph search, two of the principal components of robotic motion planning. The programmability enables the architecture to be applied to a wide range of different robots and ...
Full textCite
ConferenceProceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019 · June 1, 2019
Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error ...
Full textCite
Journal ArticleIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · November 1, 2018
We extend the lifetime of Flash memory in embedded processors by exploiting the fact that data from sensors is inherently analog. Prior work in the computer architecture community has assumed that all data is digital and has overlooked the opportunities av ...
Full textCite
ConferenceProceedings - International Symposium on Computer Architecture · July 19, 2018
Designing directory cache coherence protocols is complicated because coherence transactions are not atomic in modern multicore processors. A coherence transaction comprises multiple messages, and these messages can interleave with other conflicting coheren ...
Full textCite
Journal ArticleComputer · March 1, 2018
This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Computer Architecture Letters. ...
Full textCite
ConferenceProceedings - 35th IEEE International Conference on Computer Design, ICCD 2017 · November 22, 2017
In this paper, we introduce Jenga, a new scheme for protecting 3D DRAM, specifically high bandwidth memory (HBM), from failures in bits, rows, banks, channels, dies, and TSVs. By providing redundancy at the granularity of a cache block rather than across b ...
Full textCite
ConferenceProceedings of the Annual International Symposium on Microarchitecture, MICRO · October 14, 2017
Recent work in formal verification theory and verification-aware design has sought to bridge the divide between the class of protocols architects want to design and the class of protocols that are verifiable with state of the art tools. Particularly, the r ...
Full textCite
ConferenceProceedings of the 16th Conference on Formal Methods in Computer-Aided Design, FMCAD 2016 · March 24, 2017
We present Neo, a framework for designing pre-verified protocol components that can be instantiated and connected in an arbitrarily large hierarchy (tree), with a guarantee that the whole system satisfies a given safety property. We employ the idea of netw ...
Full textCite
Journal ArticleComputer · March 1, 2017
This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from Computer Architecture Letters. ...
Full textCite
ConferenceProceedings of the Annual International Symposium on Microarchitecture, MICRO · December 14, 2016
We have developed a hardware accelerator for motion planning, a critical operation in robotics. In this paper, we present the microarchitecture of our accelerator and describe a prototype implementation on an FPGA. We experimentally show that the accelerat ...
Full textCite
ConferenceProceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016 · September 29, 2016
Motivated by embedded systems and datacenters that require long-life components, we extend the lifetime of Flash memory using rewriting codes that allow for multiple writes to a page before it needs to be erased. Although researchers have previously explor ...
Full textCite
ConferenceRobotics: Science and Systems · January 1, 2016
We describe a process that constructs robot-specific circuitry for motion planning, capable of generating motion plans approximately three orders of magnitude faster than existing methods. Our method is based on building collision detection circuits for a ...
Cite
ConferenceACM International Conference Proceeding Series · October 5, 2015
We integrate coding techniques and layout design to elimi- nate write-disturb in phase change memories (PCMs), while enhancing lifetime and host-visible capacity. We first pro- pose a checkerboard confguration for cell layout to elimi- nate write-disturb w ...
Full textCite
Journal ArticleIEEE Micro · May 1, 2015
The goal of this work is to design cache coherence protocols with many cores such that they can be verified with existing verification methodologies. In particular, the authors focus on flat (nonhierarchical) coherence protocols using a mostly automated me ...
Full textCite
ConferenceISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software · April 27, 2015
Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different int ...
Full textOpen AccessCite
Journal ArticleIEEE Computer Architecture Letters · January 1, 2015
We have developed and evaluated Argus-G, an error detection scheme for general purpose GPU (GPGPU) cores. Argus-G is a natural extension of the Argus error detection scheme for CPU cores, and we demonstrate how to modify Argus such that it is compatible wi ...
Full textCite
ConferenceInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC · January 16, 2014
In this work, we provide energy-efficient architectural support for floating point accuracy. For each floating point addition performed, we "recycle" that operation's rounding error. We make this error architecturally visible such that it can be used, when ...
Full textCite
Journal ArticleProceedings - Design Automation Conference · January 1, 2014
Many computer systems employ dynamic power management (DPM) to maximize power efficiency. DPM offers great opportunities, but deploying it carries significant risks if the DPM scheme is not completely verified. We propose architecting the DPM scheme such t ...
Full textCite
ConferenceProceedings -Design, Automation and Test in Europe, DATE · January 1, 2014
We propose a new, low-cost, hardware-only scheme to detect errors in superscalar, out-of-order processor cores. For each instruction decoded, Nostradamus compares what the instruction is expected to do against what the instruction actually does. We impleme ...
Full textCite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · 2014
The goal of this work is to design cache coherence protocols with many cores that can be verified with state-of-the-art automated verification methodologies. In particular, we focus on flat (non-hierarchical) coherence protocols, and we use a mostly-automa ...
Full textCite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · January 1, 2014
Dynamic power management (DPM) is critical to maximizing the performance of systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computat ...
Full textCite
ConferenceInternational Multidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Management, SGEM · December 1, 2013
At this moment the fly ash is deposed into huge dump in entire word. The storage of them produces so many problems (in special environment problems): contaminates lands and water (with heavy metals), so large lands occupy, is blow up by wind (will contamin ...
Full textCite
Journal ArticleProceedings - International Symposium on Computer Architecture · August 12, 2013
We re-visit the issue of hardware consistency models in the new context of massively-threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a GPGPU, but other examples include Intel's MIC architecture and some recent academic ...
Full textCite
Journal ArticleProceedings - International Symposium on High-Performance Computer Architecture · July 23, 2013
Some recent memory technologies, including phase change memory (PCM), have lifetime reliabilities that are affected by write operations. We propose the use of coset coding to extend the lifetimes of these memories. The key idea of coset coding is that it p ...
Full textCite
Journal ArticleProceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC · January 1, 2013
Prior work developed an efficient technique, called reduced precision checking, for detecting errors in floating point addition. In this work, we extend reduced precision checking (RPC) to multiplication. Our results show that RPC can successfully detect e ...
Full textCite
Journal ArticleISPASS 2013 - IEEE International Symposium on Performance Analysis of Systems and Software · January 1, 2013
Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current heterogeneous chip. In this paper, we present a CCSVM design for a CPU/GPU chip, as we ...
Full textCite
Conference12th International Multidisciplinary Scientific GeoConference and EXPO - Modern Management of Mine Producing, Geology and Environmental Protection, SGEM 2012 · December 1, 2012
Into this paper are presented the experimental research regarding the obtaining the building materials with big quantity of fly ash from Timisoara Power Plant, Romania. There are used classical binders (lime and cement) to activate the components of fly as ...
Cite
Journal Article2012 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2012 · December 1, 2012
The goal of this paper is to extend the lifetime of Flash memory by reducing the frequency with which a given page of memory is erased. This is accomplished by increasing the number of writes that are possible before erasure is necessary. Redundancy is int ...
Full textCite
Journal ArticleCommunications of the ACM · July 1, 2012
The article discusses how on-chip hardware coherence can scale gracefully as the number of cores increases. Cache coherence has come to dominate the market for technical, as well as for legacy, reasons. Technically, hardware cache coherence provides perfor ...
Full textCite
ConferenceSociety of Petroleum Engineers - SPE Russian Oil and Gas Exploration and Production Technical Conference and Exhibition 2012 · January 1, 2012
This paper describes an easy-to-use and fast-track roadmap for Enhanced Oil Recovery (EOR) Prefeasibility Study including (1) screening of EOR suitable methods 2) estimating of additional recovery with mechanistic 3D models 3) evaluating preliminary econom ...
Full textCite
Journal ArticleProceedings -Design, Automation and Test in Europe, DATE · May 31, 2011
The huge investment in the design and production of multicore processors may be put at risk because the emerging highly miniaturized but unreliable fabrication technologies will impose significant barriers to the life-long reliable operation of future chip ...
Cite
Journal ArticleIEEE Micro · January 1, 2011
Computer systems with virtual memory are susceptible to design bugs and runtime faults in their address translation systems. Detecting bugs and faults requires a clear specification of correct behavior. A new framework for address translation aware memory ...
Full textCite
Journal ArticlePerformance Evaluation Review · January 1, 2011
Recently, several researchers have proposed schemes for low-cost, low-power error detection in the processor core. In this work, we demonstrate that one particular scheme, an enhanced implementation of the Argus framework called Argus-2, is a viable option ...
Full textCite
Journal ArticleSynthesis Lectures on Computer Architecture · January 1, 2011
Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the mem ...
Full textCite
ConferenceProceedings of the Annual International Symposium on Microarchitecture, MICRO · December 1, 2010
We propose an architectural design methodology for designing formally verifiable cache coherence protocols, called Fractal Coherence. Properly designed to be fractal in behavior, the proposed family of cache coherence protocols can be formally verified cor ...
Full textCite
Journal ArticleIEEE Computer Architecture Letters · July 1, 2010
One of the most challenging problems in developing a multicore processor is verfiying that the design is correct, and one of the most difficult aspects of pre-silicon verification is verifying that the memory system obeys the architecture’s specified ...
Full textOpen AccessCite
ConferenceInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS · May 19, 2010
Computer systems with virtual memory are susceptible to design bugs and runtime faults in their address translation (AT) systems. Detecting bugs and faults requires a clear specification of correct behavior. To address this need, we develop a framework for ...
Full textCite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · January 1, 2010
We propose UNITD, a unified hardware coherence framework that integrates translation coherence into the existing cache coherence protocol. In UNITD coherence protocols, the TLBs participate in the cache coherence protocol just like the instruction and data ...
Full textOpen AccessCite
ConferenceACM SIGPLAN Notices · January 1, 2010
Computer systems with virtual memory are susceptible to design bugs and runtime faults in their address translation (AT) systems. Detecting bugs and faults requires a clear specification of correct behavior. To address this need, we develop a framework for ...
Full textCite
Journal ArticleProceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems · December 1, 2009
We present an error detection technique for a floating point adder which uses a checker adder of reduced precision to determine if the result is correct within some error bound. Our analysis establishes a relationship between the width of the checker adder ...
Full textCite
Journal ArticleProceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems · December 1, 2009
Pre-fabrication design verification and post-fabrication chip testing are two important stages in the product realization process. These two stages consume a large part of resources in the form of time, money, and engineering effort during the process [1]. ...
Full textCite
Journal ArticleProceedings of the International Symposium on Low Power Electronics and Design · November 24, 2009
Power gating is usually driven by a predictive control, and frequent mispredictions can counter-productively lead to a large increase in energy consumption. This energy vulnerability could be exploited by malicious applications such as a power virus, or it ...
Full textCite
Journal Article2009 7th IEEE-ACM International Conference on Formal Methods and Models for Co-Design, MEMOCODE '09 · November 19, 2009
Dynamic power management (DPM) is important for multicore architectures. One important challenge for multicore DPM schemes is verifying that they are both safe (cannot lead to power or thermal catastrophes) and efficient (achieve as much performance as pos ...
Full textCite
Chapter · January 1, 2009
In this chapter, we introduce enough about cache coherence to understand how consistency models interact with caches. We start in Section 2.1 by presenting the system model that we consider throughout this primer. To simplify the exposition in this chapter ...
Full textCite
Chapter · January 1, 2009
The previous two chapters explored the memory consistency models sequential consistency (SC) and total store order (TSO). These chapters presented SC as intuitive and TSO as widely implemented (e.g., in x86). Both models are sometimes called strong because ...
Full textCite
Chapter · January 1, 2009
Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. These designs seek various goodness ...
Full textCite
Chapter · January 1, 2009
In Chapters 7 and 8, we have presented snooping and directory coherence protocols in the context of the simplest system models that were sufficient for explaining the fundamental issues of these protocols. In this chapter, we extend our presentation of coh ...
Full textCite
Chapter · January 1, 2009
In this chapter, we present snooping coherence protocols. Snooping protocols were the first widely-deployed class of protocols and they continue to be used in a variety of systems. Snooping protocols offer many attractive features, including low-latency co ...
Full textCite
Chapter · January 1, 2009
A widely implemented memory consistency model is total store order (TSO). TSO is used in SPARC implementations and, more importantly, appears to match the memory consistency model of the widely used x86 architecture. This chapter presents this important co ...
Full textCite
Chapter · January 1, 2009
In this chapter, we present directory coherence protocols. Directory protocols were originally developed to address the lack of scalability of snooping protocols. Traditional snooping systems broadcast all requests on a totally ordered interconnection netw ...
Full textCite
Chapter · January 1, 2009
In this chapter, we return to the topic of cache coherence that we introduced in Chapter 2. We defined coherence in Chapter 2, in order to understand coherence’s role in supporting consistency, but we did not delve into how specific coherence protocols wor ...
Full textCite
Chapter · January 1, 2009
This chapter delves into memory consistency models (a.k.a. memory models) that define the behavior of shared memory systems for programmers and implementors. These models define correctness so that programmers know what to expect and implementors know what ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2009
Multithreaded servers with cache-coherent shared memory are the dominant type of machines used to run critical network services and database management systems. To achieve the high availability required for these tasks, it is necessary to incorporate mecha ...
Full textCite
Journal ArticleParallel Architectures and Compilation Techniques - Conference Proceedings, PACT · December 1, 2008
To improve the lifetime performance of a multicore chip with simple cores, we propose the Core Cannibalization Architecture (CCA). A chip with CCA provisions a fraction of the cores as cannibalizable cores (CCs). In the absence of hard faults, the CCs func ...
Full textCite
Journal ArticleConference on Computing Frontiers - Proceedings of the 2008 Conference on Computing Frontiers, CF'08 · December 1, 2008
We develop architectural techniques for mitigating the impact of process variability. Our techniques hide the performance effects of slow components-including registers, functional units, and L1I and L1D cache frames-without slowing the clock frequency or ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · October 13, 2008
CMOS technology trends are leading to an increasing incidence of hard (permanent) faults in processors. These faults may be introduced at fabrication or occur in the field. Whereas high-performance processor cores have enough redundancy to tolerate many of ...
Full textCite
Journal ArticleIEEE Micro · May 1, 2008
Although most current multicore processors are homogeneous, microarchitects are now proposing heterogeneous core implementations, including systems in which heterogeneity is introduced at runtime. This article shows that operating system schedulers must co ...
Full textCite
Journal ArticleIEEE Micro · January 1, 2008
Argus, a novel approach for detecting errors in simple processor cores, dynamically verifies the correctness of the four tasks performed by a von Neumann core: control flow, data flow, computation, and memory access. Argus detects transient and permanent e ...
Full textCite
Journal Article2007 IEEE International Conference on Computer Design, ICCD 2007 · December 1, 2007
This paper addresses the run-time diagnosis of delay faults in functional units of microprocessors. Despite the popularity of the stuck-at fault model, it is no longer the only relevant fault model. The delay fault model - which assumes that the faulty cir ...
Full textCite
Journal ArticleParallel Architectures and Compilation Techniques - Conference Proceedings, PACT · December 1, 2007
The process of verifying a new microprocessor is a major problem for the computer industry. Currently, architects design processors to be fast, power-efficient, and reliable. However, architects do not quantify the impact of these design decisions on the e ...
Full textCite
Journal ArticleParallel Architectures and Compilation Techniques - Conference Proceedings, PACT · December 1, 2007
A significant fraction of the circuitry in a modern processor is dedicated to converting the linear instruction stream into a representation that allows the execution of instructions in data dependence order, rather than program order, to extract instructi ...
Full textCite
Journal ArticleProceedings of the Annual International Symposium on Microarchitecture, MICRO · December 1, 2007
We have developed Argus, a novel approach for providing low-cost, comprehensive error detection for simple cores. The key to Argus is that the operation of a von Neumann core consists of four fundamental tasks - control flow, dataflow, computation, and mem ...
Full textCite
ConferenceProceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems · December 1, 2007
We propose and evaluate the use of lazy error detection for a superscalar, out-of-order microprocessor's functional units. The key insight is that error detection is off the critical path, because an instruction's results are speculative for at least a cyc ...
Full textCite
Journal Article2007 Computing Frontiers, Conference Proceedings · October 22, 2007
The organization and management of microprocessor storage structures (e.g., L1 caches, TLBs, etc.) is critical to the performance and energy consumption of the microprocessor. We propose and develop the first microprocessor that can dynamically allocate st ...
Full textCite
Journal ArticleProceedings - International Symposium on High-Performance Computer Architecture · August 10, 2007
To provide high dependability in a multithreaded system despite hardware faults, the system must detect and correct errors in its shared memory system. Recent research has explored dynamic checking of cache coherence as a comprehensive approach to memory s ...
Full textCite
Journal ArticleACM Transactions on Architecture and Code Optimization · January 1, 2007
We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigur ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · December 22, 2006
Multithreaded servers with cache-coherent shared memory are the dominant type of machines used to run critical network services and database management systems. To achieve the high availability required for these tasks, it is necessary to incorporate mecha ...
Full textCite
Journal ArticleIEEE International Conference on Computer Design, ICCD 2006 · December 1, 2006
We deconstruct and compare the two dominant existing approaches for L1 data cache (L1D) error protection, with respect to performance, L2 cache bandwidth, power, and area. The two approaches are: (1) parity on the L1D with write-through to an ECC-protected ...
Full textCite
Journal ArticlePerformance Evaluation Review · June 1, 2006
In this paper, we present a new metric, Hard-Fault Architectural Vulnerability Factor (H-AVF), to allow designers to more effectively compare alternate hard-fault tolerance schemes. In order to provide intuition on the use of H-AVF as a metric, we evaluate ...
Full textCite
Journal ArticleIEEE Transactions on Parallel and Distributed Systems · June 1, 2006
Spinning is a synchronization mechanism commonly used in applications and operating systems. Excessive spinning, however, often indicates performance or correctness (e.g., livelock) problems. Detecting if applications and operating systems are spinning is ...
Full textCite
Journal ArticleACM Journal on Emerging Technologies in Computing Systems · January 1, 2006
This article explores the architectural challenges introduced by emerging bottom-up fabrication of nanoelectronic circuits. The specific nanotechnology we explore proposes patterned DNA nanostructures as a scaffold for the placement and interconnection of ...
Full textCite
Journal ArticleProceedings - International Test Conference · January 1, 2006
In this paper, we propose a low-cost fault tolerance technique for microprocessor multipliers, both non-pipelined (NP) and pipelined (P). Our fault tolerant multiplier designs are capable of detecting and correcting errors, diagnosing hard faults, and reco ...
Full textCite
Journal ArticleProceedings -Design, Automation and Test in Europe, DATE '05 · December 1, 2005
As device sizes shrink and current densities increase, the probability of device failures due to gate oxide breakdown (OBD) also increases. To provide designs that are tolerant to such failures, we must investigate and understand the manifestations of this ...
Full textCite
Journal ArticleProceedings of the Annual International Symposium on Microarchitecture, MICRO · December 1, 2005
We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigur ...
Full textCite
Journal ArticleProceedings - International Symposium on Computer Architecture · November 10, 2005
In this paper, we develop the first feasibly implementable scheme for end-to-end dynamic verification of multithreaded memory systems. For multithreaded (including multiprocessor) memory systems, end-to-end correctness is defined by its memory consistency ...
Full textCite
Journal ArticleIEEE Transactions on Dependable and Secure Computing · January 1, 2005
To achieve high reliability despite hard faults that occur during operation and to achieve high yield despite defects introduced at fabrication, a microprocessor must be able to tolerate hard faults. In this paper, we present a framework for autonomic self ...
Full textCite
Journal ArticleComputer · January 1, 2005
Despite the convenience of clean abstractions, technological trends are blurring the lines between design layers and creating new interactions between previously unrelated architecture layers. For example, virtual machines such as VMWare and Transmeta impl ...
Full textCite
ConferenceUSENIX 2005 Annual Technical Conference · January 1, 2005
Deadlock can occur wherever multiple processes interact. Most existing static and dynamic deadlock detection tools focus on simple types of deadlock, such as those caused by incorrect ordering of lock acquisitions. In this paper, we propose Pulse, a novel ...
Cite
Journal Article2004 4th IEEE Conference on Nanotechnology · December 1, 2004
To evaluate the potential of carbon nanotube field effect transistors (CNFETs) to replace silicon CMOS technology, we develop a SPICE model of CNFET nanoelectronics. Our model is parameterizable, and it enables composition of models of various aspects of n ...
Cite
Journal ArticleProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) · December 1, 2004
Modern multiprocessors are complex systems that often require years to design and verify, A significant factor is that engineers must allocate a disproportionate share of their effort to ensure that rare corner-case events behave correctly. This paper prop ...
Cite
Journal ArticleNanotechnology · September 1, 2004
The shift in technology away from silicon complementary metal-oxide semiconductors (CMOS) to novel nanoscale technologies requires new design tools. In this paper, we explore one particular nanotechnology: carbon nanotube transistors that are self-assemble ...
Full textCite
Conference2004 IEEE International Symposium on Performance Analysis of Systems and Software · June 14, 2004
There is increasing concern among developers that future web servers running commercial workloads may be limited by network processing overhead in the CPU as 10Gb ethernet becomes prevalent. We analyze CPU usage of real hardware running popular commercial ...
Full textCite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · January 1, 2004
In this paper, we present a hardware technique, called Self-Repairing Array Structures (SRAS), for masking hard faults in microprocessor array structures, such as the reorder buffer and branch history table. SRAS masks errors that could otherwise lead to s ...
Full textCite
ConferenceAnnual ACM Symposium on Parallel Algorithms and Architectures · December 1, 2003
A model was created for determining criticality in MP systems. An algorithm was devised for computing criticality and criticality of real MP workloads was evaluated. A directed acyclic graph (DAG) model for executing: critical path and slack; mapping DAGs ...
Cite
Journal ArticleProceedings of the International Conference on Dependable Systems and Networks · December 1, 2003
As implementations of shared memory multiprocessors become more complicated, hardware faults will increasingly cause errors that are difficult or impossible to detect with low-level, localized mechanisms. In this paper, we argue for dynamic verification (i ...
Full textCite
Journal ArticleIEEE Transactions on Parallel and Distributed Systems · February 1, 2003
This paper develops and validates an efficient analytical model for evaluating the performance of shared memory architectures with ILP processors. First, we instrument the SimOS simulator to measure the parameters for such a model and we find a surprisingl ...
Full textCite
Journal ArticleComputer (USA) · 2003
As dependence on database management systems and Web servers increases, so does the need for them to run reliably and efficiently-goals that rigorous simulations can help achieve. Execution-driven simulation models system hardware. These simulations captur ...
Full textLink to itemCite
ConferenceAnnual ACM Symposium on Parallel Algorithms and Architectures · January 1, 2003
Recent research on processor microarchitecture suggests using instruction criticality as a metric to guide hardware control policies. Fields et al. [3, 4] have proposed a directed acyclic graph (DAG) model for characterizing program microexecutions on unip ...
Full textCite
Journal ArticleConference Proceedings - Annual International Symposium on Computer Architecture, ISCA · January 1, 2003
Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors. The destination set is the collection of processors that receive a particular coherence request. Snooping protocols send requests to the maximal destina ...
Full textCite
Journal ArticleIEEE Transactions on Parallel and Distributed Systems · June 1, 2002
In this paper, we develop a specification methodology that documents and specifies a cache coherence protocol in eight tables: the states, events, actions, and transitions of the cache and memory controllers. We then use this methodology to specify a detai ...
Full textCite
Journal ArticleConference Proceedings - Annual International Symposium on Computer Architecture, ISCA · January 1, 2002
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At an abstract level, SafetyNet logically maintains multiple, globally consisten ...
Cite
ConferenceProceedings - International Symposium on High-Performance Computer Architecture · January 1, 2002
This paper advocates that cache coherence protocols use a bandwidth adaptive approach to adjust to varied system configurations (e.g., number of processors) and workload behaviors. We propose Bandwidth Adaptive Snooping Hybrid (BASH), a hybrid protocol tha ...
Full textCite
Journal ArticleProceedings of the Annual International Symposium on Microarchitecture · December 1, 2001
This paper explores the interaction of value prediction with thread-level parallelism techniques, including multithreading and multiprocessing, where correctness is defined by a memory consistency model. Value prediction subtly interacts with the memory co ...
Full textCite
Journal ArticleInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS · December 1, 2000
Symmetric multiprocessor (SMP) servers provide superior performance for the commercial workloads that dominate the Internet. Our simulation results show that over one-third of cache misses by these applications result in cache-to-cache transfers, where the ...
Cite
Journal ArticlePerformance Evaluation Review · January 1, 2000
Motivated by experience gained during the validation of a recent Approximate Mean Value Analysis (AMVA) model of modern shared memory architectures, this paper re-examines the "standard" AMVA approximation for non-exponential FCFS queues. We find that this ...
Full textCite
Journal ArticleSIGPLAN Notices (ACM Special Interest Group on Programming Languages) · January 1, 2000
Symmetric multiprocessor (SMP) servers provide superior performance for the commercial workloads that dominate the Internet. Our simulation results show that over one-third of cache misses by these applications result in cache-to-cache transfers, where the ...
Full textCite
Journal ArticleOperating Systems Review (ACM) · January 1, 2000
Symmetric multiprocessor (SMP) servers provide superior performance for the commercial workloads that dominate the Internet. Our simulation results show that over one-third of cache misses by these applications result in cache-to-cache transfers, where the ...
Full textCite
Journal ArticleIEEE High-Performance Computer Architecture Symposium Proceedings · January 1, 1999
Cache coherence protocols of current shared-memory multiprocessors are difficult to verify. Our previous work proposed an extension of Lamport's logical clocks for showing that multiprocessors can implement sequential consistency (SC) with an SGI Origin 20 ...
Full textCite
Journal ArticleAnnual ACM Symposium on Parallel Algorithms and Architectures · January 1, 1999
A computer system is useless unless it can interact with the outside world through input/output (I/O) devices. I/O systems are complex, including aspects such as memory-mapped operations, interrupts, and bus bridges. Often, I/O behavior is described for is ...
Cite
Journal ArticleConference Proceedings - Annual International Symposium on Computer Architecture, ISCA · January 1, 1999
This paper proposes a new coherence method called `multicast snooping' that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction ...
Cite
Journal ArticleConference Proceedings - Annual International Symposium on Computer Architecture, ISCA · January 1, 1998
This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism. Compared to simulation, the analytical model ...
Cite
Journal ArticleAnnual ACM Symposium on Parallel Algorithms and Architectures · January 1, 1998
Modern shared-memory multiprocessors use complex memory system implementations that include a variety of non-trivial and interacting optimizations. More time is spent in verifying the correctness of such implementations than in designing the system. In par ...
Cite