Skip to main content

Architectures for online error detection and recovery in multicore processors

Publication ,  Journal Article
Gizopoulos, D; Psarakis, M; Adve, SV; Ramachandran, P; Hari, SKS; Sorin, D; Meixner, A; Biswas, A; Vera, X
Published in: Proceedings -Design, Automation and Test in Europe, DATE
May 31, 2011

The huge investment in the design and production of multicore processors may be put at risk because the emerging highly miniaturized but unreliable fabrication technologies will impose significant barriers to the life-long reliable operation of future chips. Extremely complex, massively parallel, multi-core processor chips fabricated in these technologies will become more vulnerable to: (a) environmental disturbances that produce transient (or soft) errors, (b) latent manufacturing defects as well as aging/wearout phenomena that produce permanent (or hard) errors, and (c) verification inefficiencies that allow important design bugs to escape in the system. In an effort to cope with these reliability threats, several research teams have recently proposed multicore processor architectures that provide low-cost dependability guarantees against hardware errors and design bugs. This paper focuses on dependable multicore processor architectures that integrate solutions for online error detection, diagnosis, recovery, and repair during field operation. It discusses taxonomy of representative approaches and presents a qualitative comparison based on: hardware cost, performance overhead, types of faults detected, and detection latency. It also describes in more detail three recently proposed effective architectural approaches: a software-anomaly detection technique (SWAT), a dynamic verification technique (Argus), and a core salvaging methodology. © 2011 EDAA.

Duke Scholars

Published In

Proceedings -Design, Automation and Test in Europe, DATE

ISSN

1530-1591

Publication Date

May 31, 2011

Start / End Page

533 / 538
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gizopoulos, D., Psarakis, M., Adve, S. V., Ramachandran, P., Hari, S. K. S., Sorin, D., … Vera, X. (2011). Architectures for online error detection and recovery in multicore processors. Proceedings -Design, Automation and Test in Europe, DATE, 533–538.
Gizopoulos, D., M. Psarakis, S. V. Adve, P. Ramachandran, S. K. S. Hari, D. Sorin, A. Meixner, A. Biswas, and X. Vera. “Architectures for online error detection and recovery in multicore processors.” Proceedings -Design, Automation and Test in Europe, DATE, May 31, 2011, 533–38.
Gizopoulos D, Psarakis M, Adve SV, Ramachandran P, Hari SKS, Sorin D, et al. Architectures for online error detection and recovery in multicore processors. Proceedings -Design, Automation and Test in Europe, DATE. 2011 May 31;533–8.
Gizopoulos, D., et al. “Architectures for online error detection and recovery in multicore processors.” Proceedings -Design, Automation and Test in Europe, DATE, May 2011, pp. 533–38.
Gizopoulos D, Psarakis M, Adve SV, Ramachandran P, Hari SKS, Sorin D, Meixner A, Biswas A, Vera X. Architectures for online error detection and recovery in multicore processors. Proceedings -Design, Automation and Test in Europe, DATE. 2011 May 31;533–538.

Published In

Proceedings -Design, Automation and Test in Europe, DATE

ISSN

1530-1591

Publication Date

May 31, 2011

Start / End Page

533 / 538