Rhythm: Harnessing data parallel hardware for server workloads

Conference Paper

Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on the data center as a whole, using off-the-shelf hardware for individual servers. Server capacity is typically increased by adding more machines, which is cheap, though inefficient in the long run in terms of energy and area. Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. We present a framework - called Rhythm - for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluating both software and hardware for future cohort-based servers. Our evaluation of Rhythm on future server platforms shows that it achieves 4× the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9. A Rhythm implementation that generates transposed responses achieves 8× the i7 throughput while processing 2.5× more requests/Joule compared to the A9. Copyright © 2014 ACM.

Full Text

Duke Authors

Cited Authors

  • Agrawal, SR; Pistol, V; Pang, J; Tran, J; Tarjan, D; Lebeck, AR

Published Date

  • March 14, 2014

Published In

  • International Conference on Architectural Support for Programming Languages and Operating Systems Asplos

Start / End Page

  • 19 - 34

International Standard Book Number 13 (ISBN-13)

  • 9781450323055

Digital Object Identifier (DOI)

  • 10.1145/2541940.2541956

Citation Source

  • Scopus