APOLLO: An automated power modeling framework for runtime power introspection in high-volume commercial microprocessors
Accurate power modeling is crucial for energy-efficient CPU design and runtime management. An ideal power modeling framework needs to be accurate yet fast, achieve high temporal resolution (ideally cycle-accurate) yet with low runtime computational overheads, and easily extensible to diverse designs through automation. Simultaneously satisfying such conflicting objectives is challenging and largely unattained despite significant prior research. In this paper, we propose APOLLO, an automated per-cycle power modeling framework that serves as the basis for both a design-time power estimator and a low-overhead runtime on-chip power meter (OPM). APOLLO uses the minimax concave penalty (MCP)-based feature selection algorithm to automatically select less than 0.05% of RTL signals as power proxies. The power estimation achieves R2 > 0.95 on Arm Neoverse N1 [3] and R2 > 0.94 on Arm Cortex-A77 [2] microprocessors, respectively. When integrated with an emulator-assisted flow, APOLLO finishes per-cycle power estimation on millions-of-cycles benchmark in minutes for million-gate industrial CPU designs. Furthermore, the power model is synthesized and integrated into the microprocessor implementation as a runtime OPM. APOLLO's accuracy further improves when coarse-grained temporal resolution is preferred. To our best knowledge, this is the first runtime OPM that simultaneously achieves percycle temporal resolution and < 1% area/power overhead without compromising accuracy, which is validated on high-performance, out-of-order industrial CPU designs.