Skip to main content

Andrew Douglas Hilton

Professor of the Practice in the Department of Electrical and Computer Engineering
Electrical and Computer Engineering
90291, Durham, NC 27708
PO Box 90291, Wilkinson 105, Durham, NC 27708

Selected Publications


DynaSprint: Microarchitectural sprints with dynamic utility and thermal management

Conference Proceedings of the Annual International Symposium on Microarchitecture, MICRO · October 12, 2019 Sprinting is a class of mechanisms that provides a short but significant performance boost while temporarily exceeding the thermal design point. We propose DynaSprint, a software runtime that manages sprints by dynamically predicting utility and modeling t ... Full text Cite

Translation from Problem to Code in Seven Steps

Conference CompEd 2019 - Proceedings of the ACM Conference on Global Computing Education · May 9, 2019 Students in introductory programming courses struggle with how to turn a problem statement into code. We introduce a teaching technique, "The Seven Steps," that provides structure and guidance on how to approach a problem. The first four steps focus on dev ... Full text Cite

A technique for translation from problem to code

Conference Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE · July 2, 2018 Students in introductory programming courses struggle with how to turn a problem statement into code. We introduce a technique, “The Seven Steps,” that provides structure and guidance on how to approach a problem. The first four steps focus on devising an ... Full text Cite

MAPS: Understanding Metadata Access Patterns in Secure Memory

Conference Proceedings - 2018 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2018 · May 25, 2018 Secure memory increases both the latency and energy required for memory accesses. To reduce these overheads, computer architects have sought to cache metadata on the processor chip, but placing metadata in a simple cache has not been as effective as expect ... Full text Cite

Multi-program benchmark definition

Conference ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software · April 27, 2015 Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different int ... Full text Open Access Cite

Flexible register management using reference counting

Conference Proceedings - International Symposium on High-Performance Computer Architecture · May 3, 2012 Conventional out-of-order processors that use a unified physical register file allocate and reclaim registers explicitly using a free list that operates as a circular queue. We describe and evaluate a more flexible register management scheme - reference co ... Full text Open Access Cite

ICFP: Tolerating all-level cache misses in in-order processors

Journal Article IEEE Micro · January 1, 2010 In-order continual flow pipeline (iCFP) is an in-order pipeline that allows execution to flow around data cache misses. When a cache miss occurs, iCFP executes and speculatively retires miss-independent instructions. It saves miss-dependent instructions in ... Full text Open Access Cite

SMT-directory: Efficient load-load ordering for SMT

Journal Article IEEE Computer Architecture Letters · January 1, 2010 Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any thread appear to occur in program order to all other threads. Out-of-order execution can violate load-load ordering. Multi-processors with out-of-order cores detec ... Full text Cite

BOLT: Energy-efficient out-of-order latency-tolerant execution

Conference Proceedings - International Symposium on High-Performance Computer Architecture · January 1, 2010 LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP withou ... Full text Open Access Cite

Decoupled store completion/silent deterministic replay: Enabling scalable data memory for CPR/CFP processors

Conference Proceedings - International Symposium on Computer Architecture · November 30, 2009 CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-fli ... Full text Open Access Cite

CPROB: Checkpoint processing with opportunistic minimal recovery

Conference Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT · November 23, 2009 CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a larger instruction window and higher average IPC than conventional ROB-style register management. It does so by restricting mis-speculation recovery to checkp ... Full text Open Access Cite

Icfp: tolerating all-level cache misses in in-order processors

Conference Proceedings - International Symposium on High-Performance Computer Architecture · January 1, 2009 Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifice single-thread performance. Specifically, they do not allow execution to flow freely around data cache misses. As a result, they have difficulties overlap ... Full text Open Access Cite

Ginger: Control independence using tag rewriting

Conference Proceedings - International Symposium on Computer Architecture · October 22, 2007 The negative performance impact of branch mis-predictions can be reduced by exploiting control independence (CI). When a branch mis-predicts, the wrong-path instructions up to the point where control converges with the correct path are selectively squashed ... Full text Open Access Cite

XChange: Coupling parallel applications in a dynamic environment

Conference Proceedings - IEEE International Conference on Cluster Computing, ICCC · 2004 Modern computational science applications are becoming increasingly multi-disciplinary, involving widely distributed research teams and their underlying computational platforms. A common problem for the grid applications used in these environments is the n ... Full text Cite

Decoupling loads for nano-instruction set computers

Other Proceedings of The 43rd International Symposium on Computer Architecture We propose an ISA extension that decouples the data access and register write operations in a load instruction. We describe system and hardware support for decoupled loads. Furthermore, we show how compilers can generate better static instruction sc ... Open Access Link to item Cite

PoisonIvy: Safe speculation for secure memory

Other Proceedings of the 49th International Symposium on Microarchitecture Encryption and integrity trees guard against phys- ical attacks, but harm performance. Prior academic work has speculated around the latency of integrity verification, but has done so in an insecure manner. No industrial implementations of secure processor ... Open Access Cite