ConferenceProceedings of the Annual International Symposium on Microarchitecture Micro · October 12, 2019
Sprinting is a class of mechanisms that provides a short but significant performance boost while temporarily exceeding the thermal design point. We propose DynaSprint, a software runtime that manages sprints by dynamically predicting utility and modeling t ...
Full textCite
ConferenceComped 2019 Proceedings of the ACM Conference on Global Computing Education · May 9, 2019
Students in introductory programming courses struggle with how to turn a problem statement into code. We introduce a teaching technique, "The Seven Steps," that provides structure and guidance on how to approach a problem. The first four steps focus on dev ...
Full textCite
ConferenceAnnual Conference on Innovation and Technology in Computer Science Education Iticse · July 2, 2018
Students in introductory programming courses struggle with how to turn a problem statement into code. We introduce a technique, “The Seven Steps,” that provides structure and guidance on how to approach a problem. The first four steps focus on devising an ...
Full textCite
ConferenceProceedings 2018 IEEE International Symposium on Performance Analysis of Systems and Software Ispass 2018 · May 25, 2018
Secure memory increases both the latency and energy required for memory accesses. To reduce these overheads, computer architects have sought to cache metadata on the processor chip, but placing metadata in a simple cache has not been as effective as expect ...
Full textCite
ConferenceIspass 2015 IEEE International Symposium on Performance Analysis of Systems and Software · April 27, 2015
Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different int ...
Full textOpen AccessCite
ConferenceProceedings International Symposium on High Performance Computer Architecture · May 3, 2012
Conventional out-of-order processors that use a unified physical register file allocate and reclaim registers explicitly using a free list that operates as a circular queue. We describe and evaluate a more flexible register management scheme - reference co ...
Full textOpen AccessCite
Journal ArticleIEEE Micro · January 1, 2010
In-order continual flow pipeline (iCFP) is an in-order pipeline that allows execution to flow around data cache misses. When a cache miss occurs, iCFP executes and speculatively retires miss-independent instructions. It saves miss-dependent instructions in ...
Full textOpen AccessCite
Journal ArticleIEEE Computer Architecture Letters · January 1, 2010
Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any thread appear to occur in program order to all other threads. Out-of-order execution can violate load-load ordering. Multi-processors with out-of-order cores detec ...
Full textCite
ConferenceProceedings International Symposium on High Performance Computer Architecture · January 1, 2010
LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP withou ...
Full textOpen AccessCite
ConferenceProceedings International Symposium on Computer Architecture · November 30, 2009
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-fli ...
Full textOpen AccessCite
ConferenceParallel Architectures and Compilation Techniques Conference Proceedings Pact · November 23, 2009
CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a larger instruction window and higher average IPC than conventional ROB-style register management. It does so by restricting mis-speculation recovery to checkp ...
Full textOpen AccessCite
ConferenceProceedings International Symposium on High Performance Computer Architecture · January 1, 2009
Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifice single-thread performance. Specifically, they do not allow execution to flow freely around data cache misses. As a result, they have difficulties overlap ...
Full textOpen AccessCite
ConferenceProceedings International Symposium on Computer Architecture · October 22, 2007
The negative performance impact of branch mis-predictions can be reduced by exploiting control independence (CI). When a branch mis-predicts, the wrong-path instructions up to the point where control converges with the correct path are selectively squashed ...
Full textOpen AccessCite
ConferenceProceedings - IEEE International Conference on Cluster Computing, ICCC · 2004
Modern computational science applications are becoming increasingly multi-disciplinary, involving widely distributed research teams and their underlying computational platforms. A common problem for the grid applications used in these environments is the n ...
Full textCite
OtherProceedings of The 43rd International Symposium on Computer Architecture
We propose an ISA extension that decouples the
data access and register write operations in a load instruction.
We describe system and hardware support for decoupled loads.
Furthermore, we show how compilers can generate better static
instruction sc ...
Open AccessLink to itemCite
OtherProceedings of the 49th International Symposium on Microarchitecture
Encryption and integrity trees guard against phys- ical attacks, but harm performance. Prior academic work has speculated around the latency of integrity verification, but has done so in an insecure manner. No industrial implementations of secure processor ...
Open AccessCite