Understanding sources of inefficiency in general-purpose chips
Due to their high volume, general-purpose processors, and now chip multiprocessors (CMPs), are much more cost effective than ASICs, but lag significantly in terms of performance and energy efficiency. This paper explores the sources of these performance and energy overheads in general-purpose processing systems by quantifying the overheads of a 720p HD H.264 encoder running on a general-purpose CMP system. It then explores methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. We evaluate the gains from customizations useful to broad classes of algorithms, such as SIMD units, as well as those specific to particular computation, such as customized storage and functional units. The ASIC is 500× more energy efficient than our original four-processor CMP. Broadly, applicable optimizations improve performance by 10× and energy by 7×. However, the very low energy costs of actual core ops (100s fJ in 90nm) mean that over 90% of the energy used in these solutions is still "overhead". Achieving ASIC-like performance and efficiency requires algorithm-specific optimizations. For each sub-algorithm of H.264, we create a large, specialized functional unit that is capable of executing 100s of operations per instruction. This improves performance and energy by an additional 25× and the final customized CMP matches an ASIC solution's performance within 3× of its energy and within comparable area. Copyright 2010 ACM.
Hameed, R; Qadeer, W; Wachs, M; Azizi, O; Solomatnikov, A; Lee, BC; Richardson, S; Kozyrakis, C; Horowitz, M
Start / End Page
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)