Icfp: tolerating all-level cache misses in in-order processors

Conference Paper

Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifice single-thread performance. Specifically, they do not allow execution to flow freely around data cache misses. As a result, they have difficulties overlapping independent misses with one another. Previously proposed techniques like Runahead execution and Multipass pipelining have attacked this problem. In this paper, we go a step further and introduce iCFP (in-order Continual Flow Pipeline), an adaptation of the CFP concept to an in-order processor. When iCFP encounters a primary data cache or L2 miss, it checkpoints the register file and transitions into an "advance" execution mode. Miss-independent instructions execute as usual and even update register state. Missdependent instructions are diverted into a slice buffer, un-blocking the pipeline latches. When the miss returns, iCFP "rallies" and executes the contents of the slice buffer, merging miss-dependent state with missindependent state along the way. An enhanced register dependence tracking scheme and a novel store buffer design facilitate the merging process. Cycle-level simulations show that iCFP out-performs Runahead, Multipass, and SLTP, another non-blocking in-order pipeline design. © 2008 IEEE.

Full Text

Duke Authors

Cited Authors

  • Hilton, A; Nagarakatte, S; Roth, A

Published Date

  • January 1, 2009

Published In

Start / End Page

  • 431 - 442

International Standard Serial Number (ISSN)

  • 1530-0897

International Standard Book Number 13 (ISBN-13)

  • 9781424429325

Digital Object Identifier (DOI)

  • 10.1109/HPCA.2009.4798281

Citation Source

  • Scopus