Skip to main content

Online Diagnosis of Hard Faults in Microprocessors

Publication ,  Journal Article
Bower, FA; Sorin, DJ; Ozev, S
Published in: ACM Transactions on Architecture and Code Optimization
January 1, 2007

We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults. In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions’ core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers and present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance. © 2007, ACM. All rights reserved.

Duke Scholars

Published In

ACM Transactions on Architecture and Code Optimization

DOI

EISSN

1544-3973

ISSN

1544-3566

Publication Date

January 1, 2007

Volume

4

Issue

2

Start / End Page

8

Related Subject Headings

  • 4606 Distributed computing and systems software
  • 4009 Electronics, sensors and digital hardware
  • 0906 Electrical and Electronic Engineering
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Bower, F. A., Sorin, D. J., & Ozev, S. (2007). Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization, 4(2), 8. https://doi.org/10.1145/1250727.1250728
Bower, F. A., D. J. Sorin, and S. Ozev. “Online Diagnosis of Hard Faults in Microprocessors.” ACM Transactions on Architecture and Code Optimization 4, no. 2 (January 1, 2007): 8. https://doi.org/10.1145/1250727.1250728.
Bower FA, Sorin DJ, Ozev S. Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization. 2007 Jan 1;4(2):8.
Bower, F. A., et al. “Online Diagnosis of Hard Faults in Microprocessors.” ACM Transactions on Architecture and Code Optimization, vol. 4, no. 2, Jan. 2007, p. 8. Scopus, doi:10.1145/1250727.1250728.
Bower FA, Sorin DJ, Ozev S. Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization. 2007 Jan 1;4(2):8.

Published In

ACM Transactions on Architecture and Code Optimization

DOI

EISSN

1544-3973

ISSN

1544-3566

Publication Date

January 1, 2007

Volume

4

Issue

2

Start / End Page

8

Related Subject Headings

  • 4606 Distributed computing and systems software
  • 4009 Electronics, sensors and digital hardware
  • 0906 Electrical and Electronic Engineering
  • 0803 Computer Software