Skip to main content

Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI

Publication ,  Conference
Hanson, E; Li, S; Zhou, G; Cheng, F; Wang, Y; Bose, R; Li, HH; Chen, Y
Published in: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
October 28, 2023

The growing demand for higher compute and memory capacity driven by artificial intelligence (AI) applications pushes higher core counts in modern systems. Many-core architectures exhibiting spatial interconnects with high on-chip bandwidth are ideal for these workloads due to their data movement flexibility and sheer parallelism. However, the size of such platforms makes them particularly susceptible to manufacturing defects, prompting a need for designs and mechanisms that improve yield. Despite these techniques, nonfunctional cores and links are unavoidable. Although prior works address defective cores by disabling them and only scheduling workload to functional ones, communication latency through spatial interconnects is tightly associated with the locations of defective cores and cores with assigned work. Based on this observation, we present Si-Kintsugi, a defect-aware workload scheduling framework for spatial architectures with mesh topology. First, we design a novel and generalizable workload mapping representation and cost function that integrates defect pattern information. The mapping representation is formed into a 1D vector with simple constraints, making it an ideal candidate for open source heuristic-based optimization algorithms. After a communication latency optimized workload mapping is found, dataflow between the mapped cores is automatically generated to balance communication and computation cost. Si-Kintsugi is extensively evaluated on various workloads (i.e., BERT, ResNet, GEMM) across a wide range of defect patterns and rates. Experiment results show that Si-Kintsugi generates a workload schedule that is on average 1.34 × faster than the industry standard layer-pipelined schedule on defective platforms.

Duke Scholars

Published In

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023

DOI

Publication Date

October 28, 2023

Start / End Page

972 / 985
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hanson, E., Li, S., Zhou, G., Cheng, F., Wang, Y., Bose, R., … Chen, Y. (2023). Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 (pp. 972–985). https://doi.org/10.1145/3613424.3614278
Hanson, E., S. Li, G. Zhou, F. Cheng, Y. Wang, R. Bose, H. H. Li, and Y. Chen. “Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI.” In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, 972–85, 2023. https://doi.org/10.1145/3613424.3614278.
Hanson E, Li S, Zhou G, Cheng F, Wang Y, Bose R, et al. Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. In: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023. 2023. p. 972–85.
Hanson, E., et al. “Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI.” Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, 2023, pp. 972–85. Scopus, doi:10.1145/3613424.3614278.
Hanson E, Li S, Zhou G, Cheng F, Wang Y, Bose R, Li HH, Chen Y. Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023. 2023. p. 972–985.

Published In

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023

DOI

Publication Date

October 28, 2023

Start / End Page

972 / 985