Skip to main content

Stuck-at Fault Tolerance in RRAM Computing Systems

Publication ,  Journal Article
Xia, L; Huangfu, W; Tang, T; Yin, X; Chakrabarty, K; Xie, Y; Wang, Y; Yang, H
Published in: IEEE Journal on Emerging and Selected Topics in Circuits and Systems
March 1, 2018

Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Faults (SAFs) seriously degrade the computational accuracy of an RRAM-based computing system (RCS). In this paper, we present a fault-tolerant framework for RCS. A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space. Two baseline redundancy schemes are proposed to ensure that RCS is effective when the percentage of faulty RRAM cells is high. To reduce the number of redundant RRAM cells when the SAFs follow a non-uniform distribution or an unknown distribution, a distribution-aware redundancy scheme and a re-configurable redundancy scheme are proposed to provide dynamic fault tolerance. Simulation results show that, the baseline redundancy schemes can improve the recognition accuracy of the MNIST data set to almost the same as the RRAM-fault-free case, with an energy overhead of approximately 30%. When SAFs follow a non-uniform and an unknown distribution, the distribution-aware and re-configurable schemes can reduce the number of redundant RRAM cells from more than 200% to less than 40% and 60%, respectively, without reducing the recognition accuracy.

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

DOI

ISSN

2156-3357

Publication Date

March 1, 2018

Volume

8

Issue

1

Start / End Page

102 / 115

Related Subject Headings

  • 4008 Electrical engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xia, L., Huangfu, W., Tang, T., Yin, X., Chakrabarty, K., Xie, Y., … Yang, H. (2018). Stuck-at Fault Tolerance in RRAM Computing Systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 8(1), 102–115. https://doi.org/10.1109/JETCAS.2017.2776980
Xia, L., W. Huangfu, T. Tang, X. Yin, K. Chakrabarty, Y. Xie, Y. Wang, and H. Yang. “Stuck-at Fault Tolerance in RRAM Computing Systems.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8, no. 1 (March 1, 2018): 102–15. https://doi.org/10.1109/JETCAS.2017.2776980.
Xia L, Huangfu W, Tang T, Yin X, Chakrabarty K, Xie Y, et al. Stuck-at Fault Tolerance in RRAM Computing Systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems. 2018 Mar 1;8(1):102–15.
Xia, L., et al. “Stuck-at Fault Tolerance in RRAM Computing Systems.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 1, Mar. 2018, pp. 102–15. Scopus, doi:10.1109/JETCAS.2017.2776980.
Xia L, Huangfu W, Tang T, Yin X, Chakrabarty K, Xie Y, Wang Y, Yang H. Stuck-at Fault Tolerance in RRAM Computing Systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems. 2018 Mar 1;8(1):102–115.

Published In

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

DOI

ISSN

2156-3357

Publication Date

March 1, 2018

Volume

8

Issue

1

Start / End Page

102 / 115

Related Subject Headings

  • 4008 Electrical engineering