Load latency tolerance in dynamically scheduled processors

Published

Journal Article

This paper provides a quantitative evaluation of load latency tolerance in a dynamically scheduled processor. To determine the latency tolerance of each memory load operation, our simulations use flexible load completion policies instead of a fixed memory hierarchy that dictates the latency. Although our policies delay load completion as long as possible, they produce performance (instructions committed per cycle (IPC)) comparable to a processor with an ideal memory system where all loads complete in one cycle. Our simulations reveal that to produce IPC values within 12% of a processor with an ideal memory system, between 1% and 71% of loads need to be satisfied within a single cycle and that up to 74% can be satisfied in as many as 32 cycles, depending on the benchmark and processor configuration. Load latency tolerance is largely determined by whether a mispredicted branch is in the load's data dependence graph and the depth of the dependence graph. Our results show that up to 36% of all loads miss in the level one cache yet have latency demands lower than second level cache access times. We also show that a similar percentage of loads hit in the level one cache even though they possess enough latency tolerance to be satisfied by lower levels of the memory hierarchy.

Duke Authors

Cited Authors

  • Srinivasan, ST; Lebeck, AR

Published Date

  • October 1, 1999

Published In

  • Journal of Instruction Level Parallelism

Volume / Issue

  • 1 /

Citation Source

  • Scopus