Scholars@Duke publication: Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI

Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI

Publication , Conference

Hanson, E; Li, S; Zhou, G; Cheng, F; Wang, Y; Bose, R; Li, HH; Chen, Y

Published in: Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023

October 28, 2023

The growing demand for higher compute and memory capacity driven by artificial intelligence (AI) applications pushes higher core counts in modern systems. Many-core architectures exhibiting spatial interconnects with high on-chip bandwidth are ideal for these workloads due to their data movement flexibility and sheer parallelism. However, the size of such platforms makes them particularly susceptible to manufacturing defects, prompting a need for designs and mechanisms that improve yield. Despite these techniques, nonfunctional cores and links are unavoidable. Although prior works address defective cores by disabling them and only scheduling workload to functional ones, communication latency through spatial interconnects is tightly associated with the locations of defective cores and cores with assigned work. Based on this observation, we present Si-Kintsugi, a defect-aware workload scheduling framework for spatial architectures with mesh topology. First, we design a novel and generalizable workload mapping representation and cost function that integrates defect pattern information. The mapping representation is formed into a 1D vector with simple constraints, making it an ideal candidate for open source heuristic-based optimization algorithms. After a communication latency optimized workload mapping is found, dataflow between the mapped cores is automatically generated to balance communication and computation cost. Si-Kintsugi is extensively evaluated on various workloads (i.e., BERT, ResNet, GEMM) across a wide range of defect patterns and rates. Experiment results show that Si-Kintsugi generates a workload schedule that is on average 1.34 × faster than the industry standard layer-pipelined schedule on defective platforms.

Duke Scholars

Author Yiran Chen Electrical and Computer Engineering

Author Hai "Helen" Li Electrical and Computer Engineering

Published In

Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023

DOI

10.1145/3613424.3614278

Publication Date

October 28, 2023

Start / End Page

972 / 985

Citation

APA

Chicago

ICMJE

MLA

NLM

Hanson, E., Li, S., Zhou, G., Cheng, F., Wang, Y., Bose, R., … Chen, Y. (2023). Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. In Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023 (pp. 972–985). https://doi.org/10.1145/3613424.3614278

Hanson, E., S. Li, G. Zhou, F. Cheng, Y. Wang, R. Bose, H. H. Li, and Y. Chen. “Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI.” In Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023, 972–85, 2023. https://doi.org/10.1145/3613424.3614278.

Hanson E, Li S, Zhou G, Cheng F, Wang Y, Bose R, et al. Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. In: Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023. 2023. p. 972–85.

Hanson, E., et al. “Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI.” Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023, 2023, pp. 972–85. Scopus, doi:10.1145/3613424.3614278.

Hanson E, Li S, Zhou G, Cheng F, Wang Y, Bose R, Li HH, Chen Y. Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023. 2023. p. 972–985.

Published In

Proceedings of the 56th Annual IEEE ACM International Symposium on Microarchitecture Micro 2023

DOI

10.1145/3613424.3614278

Publication Date

October 28, 2023

Start / End Page

972 / 985