Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems

Conference Paper

© 2017 ACM. An RRAM-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. Online training for RCS enables hardware-based learning for a given application and reduces the additional error caused by device parameter variations. However, a high occurrence rate of hard faults due to immature fabrication processes and limited write endurance restrict the applicability of on-line training for RCS. We propose a fault-Tolerant on-line training method that alternates between a fault-detection phase and a fault-Tolerant training phase. In the fault-detection phase, a quiescent-voltage comparison method is utilized. In the training phase, a threshold-Training method and a re-mapping scheme is proposed. Our results show that, compared to neural computing without fault tolerance, the recognition accuracy for the Cifar-10 dataset improves from 37% to 83% when using low-endurance RRAM cells, and from 63% to 76% when using RRAM cells with high endurance but a high percentage of initial faults.

Full Text

Duke Authors

Cited Authors

  • Xia, L; Liu, M; Ning, X; Chakrabarty, K; Wang, Y

Published Date

  • June 18, 2017

Published In

Volume / Issue

  • Part 128280 /

International Standard Serial Number (ISSN)

  • 0738-100X

International Standard Book Number 13 (ISBN-13)

  • 9781450349277

Digital Object Identifier (DOI)

  • 10.1145/3061639.3062248

Citation Source

  • Scopus