Understanding sources of inefficiency in general-purpose chips

Published

Journal Article

Scaling the performance of a power limited processor requires decreasing the energy expended per instruction executed, since energy/op * op/second is power. To better understand what improvement in processor efficiency is possible, and what must be done to capture it, we quantify the sources of the performance and energy overheads of a 720p HDH.264 encoder running on a general-purpose fourprocessor CMPsystem. The initial overheads are large: the CMPwas 500× less energy efficient than an Application Specific Integrated Circuit (ASIC) doing the same job. We explore methods to eliminate these overheads by transforming the CPUinto a specialized system for H.264 encoding. Broadly applicable optimizations like single instruction, multiple data (SIMD) units improve CMPperformance by 14× and energy by 10×, which is still 50× worse than an ASIC. The problem is that the basic operation costs in H.264 are so small that even with a SIMDunit doing over 10 ops per cycle, 90% of the energy is still overhead. Achieving ASIClike performance and efficiency requires algorithm-specific optimizations. For each subalgorithm of H.264, we create a large, specialized functional/storage unit capable of executing hundreds of operations per instruction. This improves energy efficiency by 160× (instead of 10×), and the final customized CMPreaches the same performance and within 3× of an ASIC solution's energy in comparable area. © 2011 ACM.

Full Text

Duke Authors

Cited Authors

  • Hameed, R; Qadeer, W; Wachs, M; Azizi, O; Solomatnikov, A; Lee, BC; Richardson, S; Kozyrakis, C; Horowitz, M

Published Date

  • October 1, 2011

Published In

Volume / Issue

  • 54 / 10

Start / End Page

  • 85 - 93

Electronic International Standard Serial Number (EISSN)

  • 1557-7317

International Standard Serial Number (ISSN)

  • 0001-0782

Digital Object Identifier (DOI)

  • 10.1145/2001269.2001291

Citation Source

  • Scopus