Fault Tolerance for RRAM-Based Matrix Operations



© 2018 IEEE. An RRAM-based computing system (RCS) provides an energy efficient hardware implementation of vector-matrix multiplication for machine-learning hardware. However, it is vulnerable to faults due to the immature RRAM fabrication process. We propose an efficient fault tolerance method for RCS; the proposed method, referred to as extended-ABFT (X-ABFT), is inspired by algorithm-based fault tolerance (ABFT). We utilize row checksums and test-input vectors to extract signatures for fault detection and error correction. We present a solution to alleviate the overflow problem caused by the limited number of voltage levels for the test-input signals. Simulation results show that for a Hopfield classifier with faults in 5% of its RRAM cells, X-ABFT allows us to achieve nearly the same classification accuracy as in the fault-free case.

Full Text

Duke Authors

Cited Authors

  • Liu, M; Xia, L; Wang, Y; Chakrabarty, K

Published Date

  • January 23, 2019

Published In

International Standard Serial Number (ISSN)

  • 1089-3539

International Standard Book Number 13 (ISBN-13)

  • 9781538683828

Digital Object Identifier (DOI)

  • 10.1109/TEST.2018.8624687

Citation Source

  • Scopus