Skip to main content

Jeffrey S. Chase

Professor of Computer Science
Computer Science
Box 90129, Durham, NC 27708-0129
D306 Lev Sci Res Ctr, Durham, NC 27708

Selected Publications


ImPACT: A networked service architecture for safe sharing of restricted data

Journal Article Future Generation Computer Systems · April 1, 2022 In this paper we describe an architecture developed and prototyped in the course of the NSF-funded project called ImPACT—Infrastructure for Privacy-Assured CompuTations. This architecture addresses the common problems that arise from the need to securely s ... Full text Cite

NetHint: White-Box Networking for Multi-Tenant Data Centers

Conference Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022 · January 1, 2022 A cloud provider today provides its network resources to its tenants as a black box, such that cloud tenants have little knowledge of the underlying network characteristics. Meanwhile, data-intensive applications have increasingly migrated to the cloud, an ... Cite

Federated Authorization for Managed Data Sharing: Experiences from the ImPACT Project

Conference Proceedings - International Conference on Computer Communications and Networks, ICCCN · July 1, 2021 This paper presents the rationale and design of the trust plane for ImPACT, a federated platform for managed sharing of restricted data. Key elements of the architecture include Web-based notaries for credential establishment based on declarative templates ... Full text Cite

Interpreting write performance of supercomputer I/O systems with regression models

Conference Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021 · May 1, 2021 This work seeks to advance the state of the art in HPC I/O performance analysis and interpretation. In particular, we demonstrate effective techniques to: (1) model output performance in the presence of I/O interference from production loads; (2) build fea ... Full text Cite

WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows

Conference Proceedings - IEEE International Conference on Cluster Computing, ICCC · January 1, 2021 This paper introduces WIRE that manages resources for the DAG-based workflows on IaaS clouds. WIRE predicts and plans resources over the MAPE (Monitor-AnalyzePlan-Execute) loops to: 1) Estimate task performance with online data, 2) Conduct simulations to p ... Full text Cite

Logical peering for interdomain networking on testbeds

Conference IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2020 · July 1, 2020 Research testbed fabrics have potential to support long-lived, evolving, interdomain experiments, including opt-in application traffic across multiple campuses and edge sites. We propose abstractions and security infrastructure to facilitate multi-domain n ... Full text Cite

Characterizing output bottlenecks of a production supercomputer: Analysis and implications

Journal Article ACM Transactions on Storage · January 16, 2020 This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in the ... Full text Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2020 © Workshop on Hot Topics in Cloud Computing, HotCloud 2009.All right reserved. This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflecti ... Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2020 © Workshop on Hot Topics in Cloud Computing, HotCloud 2009.All right reserved. This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflecti ... Cite

Applying machine learning to understand write performance of large-scale parallel filesystems

Conference Proceedings of PDSW 2019: IEEE/ACM 4th International Parallel Data Systems Workshop - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis · November 1, 2019 In high-performance computing (HPC), I/O performance prediction offers the potential to improve the efficiency of scientific computing. In particular, accurate prediction can make runtime estimates more precise, guide users toward optimal checkpoint strate ... Full text Cite

Cheating the I/O bottleneck: Network storage with Trapeze/Myrinet

Conference USENIX 1998 Annual Technical Conference · January 1, 2019 Recent advances in I/O bus structures (e.g., PCI), highspeed networks, and fast, cheap disks have significantly expanded the I/O capacity of desktop-class systems. This paper describes a messaging system designed to deliver the potential of these advances ... Cite

Automatic program transformation with JOIE

Conference USENIX 1998 Annual Technical Conference · January 1, 2019 While the availability of platform-independent code on the Internet is increasing, third-party code rarely exhibits all of the features desired by end users. Unfortunately, developers cannot foresee and provide for all possible extensions. In this paper, w ... Cite

Cheating the I/O bottleneck: Network storage with Trapeze/Myrinet

Conference USENIX 1998 Annual Technical Conference · January 1, 2019 © USENIX 1998 Annual Technical Conference. all rights reserved. Recent advances in I/O bus structures (e.g., PCI), highspeed networks, and fast, cheap disks have significantly expanded the I/O capacity of desktop-class systems. This paper describes a messa ... Cite

Automatic program transformation with JOIE

Conference USENIX 1998 Annual Technical Conference · January 1, 2019 © USENIX 1998 Annual Technical Conference. all rights reserved. While the availability of platform-independent code on the Internet is increasing, third-party code rarely exhibits all of the features desired by end users. Unfortunately, developers cannot f ... Cite

The Future of Multi-Clouds: A Survey of Essential Architectural Elements

Conference 2018 International Scientific and Technical Conference Modern Computer Network Technologies, MoNeTeC 2018 - Proceedings · December 10, 2018 In this paper we present a vision of an environment composed of multiple independent cloud providers of various sizes, interconnected by programmable networks in which tenants may acquire resources from the providers and interconnect them together to serve ... Full text Cite

Toward live inter-domain network services on the ExoGENI testbed

Conference INFOCOM 2018 - IEEE Conference on Computer Communications Workshops · July 6, 2018 A key dimension of reproducibility in testbeds is stable performance that scales in regular and predictable ways in accordance with declarative specifications for virtual resources. We contend that reproducibility is crucial for elastic performance control ... Full text Cite

The future of distributed network research infrastructure

Journal Article Computer Communication Review · April 1, 2018 Shared research infrastructure that is globally distributed and widely accessible has been a hallmark of the networking community. We present a vision for a future mid-scale distributed research infrastructure aimed at enabling new types of discoveries. Th ... Full text Cite

Slice-based network transit service: Inter-domain L2 networking on ExoGENI

Conference 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2017 · November 20, 2017 The GENI network testbed was designed to enable experimentation with network protocols by offering the capability to construct virtual networks at the link layer (L2). GENI users build virtual networks in their GENI slices that span resources on multiple G ... Full text Cite

Predicting output performance of a petascale supercomputer

Conference HPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing · June 26, 2017 In this paper, we develop a predictive model useful for output performance prediction of supercomputer file systems under production load. Our target environment is Titan-the 3rd fastest supercomputer in the world-and its Lustre-based multi-stage write pat ... Full text Cite

Rethinking Security in the Era of Cloud Computing

Journal Article IEEE Security and Privacy · May 1, 2017 Cloud computing has emerged as a dominant computing platform for the foreseeable future, disrupting the way we build and deploy software. This disruption offers a rare opportunity to integrate new computer security approaches. ... Full text Cite

Enabling lightweight transactions with precision time

Journal Article ACM SIGPLAN Notices · April 4, 2017 Distributed transactional storage is an important service in today's data centers. Achieving high performance without high complexity is often a challenge for these systems due to sophisticated consistency protocols and multiple layers of abstraction. In t ... Full text Cite

Towards an experimental legoland: Slice modification and recovery in ExoGENI testbed

Conference Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST · January 1, 2017 This paper describes advanced capabilities that were deployed recently in the ExoGENI testbed to offer increased flexibility in provisioning, modifying, and recovering the topologies and the configuration settings of the virtual systems, or slices, in whic ... Full text Cite

Rethinking Security in the Era of Cloud Computing

Journal Article IEEE Security and Privacy · January 1, 2017 Cloud computing has emerged as a dominant computing platform for the foreseeable future, resulting in an ongoing disruption to the way we build and deploy software. This disruption offers a rare opportunity to integrate new approaches to computer security. ... Full text Cite

Output performance study on a production petascale filesystem

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2017 This paper reports our observations from a top-tier supercomputer Titan and its Lustre parallel file stores under production load. In summary, we find that supercomputer file systems are highly variable across the machine at fine time scales. This variabil ... Full text Cite

TapCoN: Practical third-party attestation for the cloud

Conference 9th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2017, co-located with USENIX ATC 2017 · January 1, 2017 © 2017 USENIX Association. All rights reserved. One way to establish trust in a service is to know what code it is running. However, verified code identity is currently not possible for programs launched on a cloud by another party. We propose an approach ... Cite

TapCoN: Practical third-party attestation for the cloud

Conference 9th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2017, co-located with USENIX ATC 2017 · January 1, 2017 © 2017 USENIX Association. All rights reserved. One way to establish trust in a service is to know what code it is running. However, verified code identity is currently not possible for programs launched on a cloud by another party. We propose an approach ... Cite

TapCoN: Practical third-party attestation for the cloud

Conference 9th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2017, co-located with USENIX ATC 2017 · January 1, 2017 One way to establish trust in a service is to know what code it is running. However, verified code identity is currently not possible for programs launched on a cloud by another party. We propose an approach to integrate support for code attestation—authen ... Cite

CQSTR: Securing cross-tenant applications with Cloud containers

Conference Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016 · October 5, 2016 Cloud providers are in a position to greatly improve the trust clients have in network services: IaaS platforms can isolate services so they cannot leak data, and can help verify that they are securely deployed. We describe a new system called CQSTR that a ... Full text Cite

Preface

Book · March 15, 2016 Full text Cite

A retrospective on ORCA: Open resource control architecture

Chapter · January 1, 2016 ORCA is an extensible platform for building infrastructure servers based on a foundational leasing abstraction. These servers include AggregateManagers for diverse resource providers and stateful controllers for dynamic slices. ORCA also defines a brokerin ... Full text Cite

ExoGENI: A multi-domain infrastructure-as-a-service testbed

Chapter · January 1, 2016 This chapter describes ExoGENI, a multi-domain testbed infrastructure built using the ORCA control framework. ExoGENI links GENI to two advances in virtual infrastructure (IaaS) services outside of GENI: open cloud computing (OpenStack) and dynamic circuit ... Full text Cite

Adapting Scientific Workflows on Networked Clouds Using Proactive Introspection

Conference Proceedings - 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing, UCC 2015 · January 1, 2015 Recent advances in cloud technologies and on-demand network circuits have created an unprecedented opportunity to enable complex data-intensive scientific applications to run on dynamic, networked cloud infrastructure. However, there is a lack of tools for ... Full text Cite

Trust as the foundation of resource exchange in GENI

Conference Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST · January 1, 2015 Researchers and educators in computer science and other domains are increasingly turning to distributed test beds that offer access to a variety of resources, including networking, computation, storage, sensing, and actuation. The provisioning of resources ... Full text Cite

GENI: A federated testbed for innovative network experiments

Journal Article Computer Networks · March 14, 2014 GENI, the Global Environment for Networking Innovation, is a distributed virtual laboratory for transformative, at-scale experiments in network science, services, and security. Designed in response to concerns over Internet ossification, GENI is enabling a ... Full text Cite

GENI: A federated testbed for innovative network experiments

Journal Article Computer Networks · 2014 Cite

Thoughts on the state of cloud over the next five years

Journal Article IEEE Cloud Computing · January 1, 2014 In this issue of IEEE Cloud Computing, EIC Mazin Yousif talks with experts from US and European universities about the current state of cloud computing as well as where the technology is heading over next 5 to 10 years. They cover critical cloud topics, su ... Full text Cite

Preface

Other Journal of Thoracic Disease · January 1, 2014 Full text Cite

ExoGENI: A multi-domain infrastructure-as-a-service testbed

Journal Article Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering · December 1, 2012 NSF's GENI program seeks to enable experiments that run within virtual network topologies built-to-order from testbed infrastructure offered by multiple providers (domains). GENI is often viewed as a network testbed integration effort, but behind it is an ... Full text Cite

Dynamic network provisioning for data intensive applications in the cloud

Journal Article 2012 IEEE 8th International Conference on E-Science, e-Science 2012 · December 1, 2012 Advanced networks are an essential element of data-driven science enabled by next generation cyberinfrastructure environments. Computational activities increasingly incorporate widely dispersed resources with linkages among software components spanning mul ... Full text Cite

Characterizing output bottlenecks in a supercomputer

Journal Article International Conference for High Performance Computing, Networking, Storage and Analysis, SC · December 1, 2012 Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below th ... Full text Cite

Provisioning and evaluating multi-domain networked clouds for Hadoop-based applications

Journal Article Proceedings - 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011 · December 1, 2011 This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created "on-demand" and are composed of virtual machines from multiple cloud ... Full text Cite

Virtual smart grid architecture and control framework

Journal Article 2011 IEEE International Conference on Smart Grid Communications, SmartGridComm 2011 · December 1, 2011 In this paper, we present a cloud based virtual Smart Grid (vSG) architecture and its concept design. This novel architecture extends the pervasive virtualization principle to the wide area smart grid sensory, communication, and control systems, and essent ... Full text Cite

Testbeds and research infrastructures: Development of Networks and Communities

Journal Article Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST · December 1, 2011 Cite

Trusted platform-as-a-service: A foundation for trustworthy cloud-hosted applications

Journal Article Proceedings of the ACM Conference on Computer and Communications Security · November 16, 2011 The applications we use are increasingly packaged as network services running in the cloud under the control of a service provider. Users of these services have no basis to determine if these services are trustworthy, beyond the assurances of the service p ... Full text Cite

Embedding virtual topologies in networked clouds

Journal Article Proceedings of the 6th International Conference on Future Internet Technologies, CFI11 · August 16, 2011 Embedding virtual topologies in physical network infrastructure has been an area of active research for the future Internet and network testbeds. Virtual network embedding is also useful for linking virtual compute clusters allocated from cloud providers. ... Full text Cite

Deadline-sensitive workflow orchestration without explicit resource control

Journal Article Journal of Parallel and Distributed Computing · March 1, 2011 Deadline-sensitive workflows require careful coordination of user constraints with resource availability. Current distributed resource access models provide varying degrees of resource control: from limited or none in grid batch systems to explicit in clou ... Full text Cite

Testbeds and research infrastructures: Development of Networks and Communities

Journal Article Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST · 2011 Cite

Automated control for elastic storage

Journal Article Proceeding of the 7th International Conference on Autonomic Computing, ICAC '10 and Co-located Workshops · July 23, 2010 Elasticity - where systems acquire and release resources in response to dynamic workloads, while paying only for what they need - is a driving property of cloud computing. At the core of any elastic system is an automated controller. This paper addresses e ... Full text Cite

Networked cloud orchestration: A GENI perspective

Journal Article 2010 IEEE Globecom Workshops, GC'10 · January 1, 2010 This paper describes the experience of developing a system for creation of distributed linked configurations of heterogeneous resources (slices) in GENI. Our work leverages a number of unique architectural solutions (distributed architecture, declarative r ... Full text Cite

Automated control in cloud computing: Challenges and opportunities

Journal Article Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, ACDC '09 · November 30, 2009 With advances in virtualization technology, virtual machine services offered by cloud utility providers are becoming increasingly powerful, anchoring the ecosystem of cloud services. Virtual computing services are attractive in part because they enable cus ... Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2009 This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflective controller launches application functions based on knowledge of what resourc ... Cite

Rethinking FTP: Aggressive block reordering for large file transfers

Journal Article ACM Transactions on Storage · January 1, 2009 Whole-file transfer is a basic primitive for Internet content dissemination. Content servers are increasingly limited by disk arm movement, given the rapid growth in disk density, disk transfer rates, server network bandwidth, and content size. Individual ... Full text Cite

Weighted fair sharing for dynamic virtual clusters

Journal Article SIGMETRICS'08: Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · December 12, 2008 In a shared server infrastructure, a scheduler controls how quantities of resources are shared over time in a fair manner across multiple, competing consumers. It should support wide (parallel) requests for variable-sized pool of resources, provide assuran ... Full text Cite

Secure control of portable images in a virtual computing utility

Journal Article Proceedings of the ACM Conference on Computer and Communications Security · December 1, 2008 A virtual computing utility hosts guest virtual machines on server provider sites. Each VM is an instantiation of some image or virtual appliance, which might be supplied by the VM owner or a third-party image provider. This paper addresses the problem of ... Full text Cite

Cutting corners: Workbench automation for server benchmarking

Conference Proceedings of the 2008 USENIX Annual Technical Conference, USENIX 2008 · January 1, 2008 A common approach to benchmarking a server is to measure its behavior under load from a workload generator. Often a set of such experiments is required—perhaps with different server configurations or workload parameters—to obtain a statistically sound resu ... Cite

Automated and on-demand provisioning of virtual machines for database applications

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007 Utility computing delivers compute and storage resources to applications as an 'on-demand utility', much like electricity, from a distributed collection of computing resources. There is great interest in running database applications on utility resources ( ... Full text Cite

Strong accountability for network storage

Journal Article ACM Transactions on Storage · October 1, 2007 This article presents the design, implementation, and evaluation of CATS, a network storage service with strong accountability properties. CATS offers a simple web services interface that allows clients to read and write opaque objects of variable size. Th ... Full text Cite

Strong accountability for network storage

Conference FAST 2007 - 5th USENIX Conference on File and Storage Technologies · January 1, 2007 This paper presents the design, implementation, and evaluation of CATS, a network storage service with strong accountability properties. A CATS server annotates read and write responses with evidence of correct execution, and offers audit and challenge int ... Cite

Towards an autonomic computing testbed

Conference Proceedings of the 2nd International Workshop on Hot Topics in Autonomic Computing, HotAC 2007, Held in conjunction with ICAC 2007 · January 1, 2007 This paper introduces Automat, a testbed architecture and prototype for research in autonomic services and hosting centers. Automat is an interactive web-based laboratory in which users allocate resources from an ondemand server cluster to experiment with ... Cite

Learning application models for utility resource planning

Journal Article Proceedings - 3rd International Conference on Autonomic Computing, ICAC 2006 · December 1, 2006 Shared computing utilities allocate compute, network, and storage resources to competing applications on demand. An awareness of the demands and behaviors of the hosted applications can help the system to manage its resources more effectively. This paper p ... Cite

Virtual machine hosting for networked clusters: Building the foundations for "autonomic" orchestration

Journal Article VTDC 2006 - 2nd International Workshop on Virtualization Technology in Distributed Computing; held in Conjunction with SC06 · December 1, 2006 Virtualization technology offers powerful resource management mechanisms, including performance-isolating resource schedulers, live migration, and suspend/resume. But how should networked virtual computing systems use these mechanisms? A grand challenge is ... Full text Cite

Toward a doctrine of containment: Grid hosting with adaptive resource control

Journal Article Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06 · December 1, 2006 Grid computing environments need secure resource control and predictable service quality in order to be sustainable. We propose a grid hosting model in which independent, self-contained grid deployments run within isolated containers on shared resource pro ... Full text Cite

Weatherman: Automated, online, and predictive thermal mapping and management for data centers

Journal Article Proceedings - 3rd International Conference on Autonomic Computing, ICAC 2006 · December 1, 2006 Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to cooling system failures. A key unresolved obstacle to the practical implementatio ... Cite

Ensemble-level power management for dense blade servers

Journal Article Proceedings - International Symposium on Computer Architecture · December 1, 2006 One of the key challenges for high-density servers (e.g., blades) is the increased costs in addressing the power and heat density associated with compaction. Prior approaches have mainly focused on reducing the heat generated at the level of an individual ... Full text Cite

Sharing networked resources with brokered leases

Conference USENIX 2006 Annual Technical Conference · January 1, 2006 This paper presents the design and implementation of Shirako, a system for on-demand leasing of shared networked resources. Shirako is a prototype of a service-oriented architecture for resource providers and consumers to negotiate access to resources over ... Cite

Virtual playgrounds: Managing virtual resources in the Grid

Journal Article 20th International Parallel and Distributed Processing Symposium, IPDPS 2006 · January 1, 2006 Large Grid deployments increasingly require abstractions and methods decoupling the work of resource providers and resource consumers to implement scalable management methods. We proposed the abstraction of a Virtual Workspace (VW) describing a virtual exe ... Full text Cite

Active and accelerated learning of cost models for optimizing scientific applications

Conference VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases · January 1, 2006 We present the NIMO system that automatically learns cost models for predicting the execution time of computational-science applications running on large-scale networked utilities such as computational grids. Accurate cost models are important for selectin ... Cite

Self-recharging virtual currency

Journal Article Proceedings of ACM SIGCOMM 2005 Workshops: Conference on Computer Communications · December 30, 2005 Market-based control is attractive for networked computing utilities in which consumers compete for shared resources (computers, storage, network bandwidth). This paper proposes a new self-recharging virtual currency model as a common medium of exchange in ... Cite

Self-recharging virtual currency

Journal Article Proceedings of ACM SIGCOMM 2005 3rd Workshop on the Economics of Peer-to-Peer Systems, P2PECON 2005 · December 1, 2005 Market-based control is attractive for networked computing utilities in which consumers compete for shared resources (computers, storage, network bandwidth). This paper proposes a new self-recharging virtual currency model as a common medium of exchange in ... Full text Cite

Model-driven placement of compute tasks and data in a networked utility

Journal Article Proceedings - Second International Conference on Autonomic Computing, ICAC 2005 · December 1, 2005 An important problem in resource management for networked resource-sharing systems is the simultaneous allocation of multiple resources to an application. Self-optimizing systems must co-allocate resources in a way that reconciles competing demands and max ... Full text Cite

Lerna: An active storage framework for flexible data access and management

Journal Article Proceedings of the IEEE International Symposium on High Performance Distributed Computing · November 10, 2005 In the present paper, we examine the problem of supporting application-specific computation within a network file server. Our objectives are (i) to introduce an easy to use yet powerful architecture for executing both custom-developed and legacy applicatio ... Full text Cite

Controllable fair queuing for meeting performance goals

Journal Article Performance Evaluation · October 1, 2005 Computing and storage utilities must control resource usage to meet contractual performance targets for hosted customers under dynamic conditions, including flash crowds and unexpected resource failures. This paper explores properties of proportional share ... Full text Cite

Scale and performance in semantic storage management of data grids

Journal Article International Journal on Digital Libraries · April 1, 2005 Data grids are middleware systems that offer secure shared storage of massive scientific datasets over wide area networks. The main challenge in their design is to provide reliable storage, search, and transfer of numerous or large files over geographicall ... Full text Cite

Making scheduling “cool”: Temperature-aware workload placement in data centers

Conference USENIX 2005 Annual Technical Conference · January 1, 2005 Trends towards consolidation and higher-density computing configurations make the problem of heat management one of the critical challenges in emerging data centers. Conventional approaches to addressing this problem have focused at the facilities level to ... Cite

Balance of power: Dynamic thermal management for internet data centers

Journal Article IEEE Internet Computing · January 1, 2005 Internet-based applications and their resulting multitier distributed architectures have changed the focus of design for large-scale Internet computing. Internet server applications execute in a horizontally scalable topology across hundreds or thousands o ... Full text Cite

Trust but verify: Accountability for network services

Journal Article Proceedings of the 11th Workshop on ACM SIGOPS European Workshop, EW 11 · December 1, 2004 This paper promotes accountability as a central design goal for dependable networked systems. We define three properties for accountable systems that extend beyond the basic security properties of authentication, privacy, and integrity. These accountabilit ... Full text Cite

Lessons and challenges in automating data dependability

Journal Article Proceedings of the 11th Workshop on ACM SIGOPS European Workshop, EW 11 · December 1, 2004 Designing and managing dependable systems is a difficult endeavor. In this paper, we describe challenges in this vast problem space, including provisioning and allocating shared resources, adaptively managing system dependability, expressing dependability ... Full text Cite

Globus and PlanetLab resource management solutions compared

Journal Article IEEE International Symposium on High Performance Distributed Computing, Proceedings · October 18, 2004 PlanetLab and Globus Toolkit are gaining widespread adoption in their respective communities. Although designed to solve different problems-PlanetLab is deploying a worldwide infrastructure testbed for experimenting with network services, while Globus is o ... Cite

Balancing risk and reward in a market-based task service

Journal Article IEEE International Symposium on High Performance Distributed Computing, Proceedings · October 18, 2004 This paper investigates the question of scheduling tasks according to a user-centric value metric - called yield or utility. User value is an attractive basis for allocating shared computing resources, and is fundamental to economic approaches to resource ... Full text Cite

Preface

Journal Article IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops · January 1, 2004 Full text Cite

Designing for disasters

Conference Proceedings of the 3rd USENIX Conference on File and Storage Technologies, FAST 2004 · January 1, 2004 © 2004 by The USENIX Association All Rights Reserved. Losing information when a storage device or data center fails can bring a company to its knees—or put it out of business altogether. Such catastrophic outcomes can readily be prevented with today’s stor ... Cite

Correlating instrumentation data to system states: A building block for automated diagnosis and control

Conference OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation · January 1, 2004 This paper studies the use of statistical induction techniques as a basis for automated performance diagnosis and performance management. The goal of the work is to develop and evaluate tools for offline and online analysis of system metrics gathered from ... Cite

Designing for disasters

Conference Proceedings of the 3rd USENIX Conference on File and Storage Technologies, FAST 2004 · January 1, 2004 © 2004 by The USENIX Association All Rights Reserved. Losing information when a storage device or data center fails can bring a company to its knees—or put it out of business altogether. Such catastrophic outcomes can readily be prevented with today’s stor ... Cite

Designing for disasters

Conference Proceedings of the 3rd USENIX Conference on File and Storage Technologies, FAST 2004 · January 1, 2004 Losing information when a storage device or data center fails can bring a company to its knees—or put it out of business altogether. Such catastrophic outcomes can readily be prevented with today’s storage technology, albeit with some difficulty: the desig ... Cite

Interposed proportional sharing for a storage service utility

Journal Article Performance Evaluation Review · January 1, 2004 This paper develops and evaluates new share-based scheduling algorithms for differentiated service quality in network services, such as network storage servers. This form of resource control makes it possible to share a server among multiple request flows ... Full text Cite

Circus: Opportunistic block reordering for scalable content servers

Conference Proceedings of the 3rd USENIX Conference on File and Storage Technologies, FAST 2004 · January 1, 2004 © 2004 by The USENIX Association All Rights Reserved. Whole-file transfer is a basic primitive for Internet content dissemination. Content servers are increasingly limited by disk arm movement given the rapid growth in disk density, disk transfer rates, se ... Cite

Efficient flow computation on massive grid terrain datasets

Journal Article GeoInformatica · December 1, 2003 As detailed terrain data becomes available. GIS terrain applications target larger geographic areas at finer resolutions. Processing the massive datasets involved in such applications presents significant challenges to GIS systems and demands algorithms th ... Full text Cite

On the elusive benefits of protocol offload

Conference Proceedings of the ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, NICELI 2003 · August 25, 2003 Periodic order-of-magnitude jumps in Ethernet bandwidth regularly reawaken interest in TCP/IP transport protocol offload. This time the jump to 10-Gigabit Ethernet coincides with the emergence of new network storage protocols (iSCSI and DAFS), and vendors ... Full text Cite

Toward scaling network emulation using topology partitioning

Conference Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS · January 1, 2003 Scalability is the primary challenge to studying large complex network systems with network emulation. This paper studies topology partitioning, assigning disjoint pieces of the network topology across processors, as a technique to increase emulation capac ... Full text Cite

Model-Based Resource Provisioning in a Web Service Utility

Conference 4th USENIX Symposium on Internet Technologies and Systems, USITS 2003 · January 1, 2003 Internet service utilities host multiple server applications on a shared server cluster. A key challenge for these systems is to provision shared resources on demand to meet service quality targets at least cost. This paper presents a new approach to utili ... Cite

SHARP: An architecture for secure resource peering

Journal Article Operating Systems Review (ACM) · January 1, 2003 This paper presents SHARP, a framework for secure distributed resource management in an Internet-scale computing infrastructure. The cornerstone of SHARP is a construct to represent cryptographically protected resource claimspromises or rights to control r ... Full text Cite

On the Elusive Benefits of Protocol Offload

Journal Article Proceedings of the ACM SIGCOMM Workshops · January 1, 2003 Periodic order-of-magnitude jumps in Ethernet bandwidth regularly reawaken interest in TCP/IP transport protocol offload. This time the jump to 10-Gigabit Ethernet coincides with the emergence of new network storage protocols (iSCSI and DAFS), and vendors ... Full text Cite

Dynamic virtual clusters in a grid site manager

Conference Proceedings of the IEEE International Symposium on High Performance Distributed Computing · January 1, 2003 This paper presents new mechanisms for dynamic resource management in a cluster manager called Cluster-on-Demand (COD). COD allocates servers from a common pool to multiple virtual clusters (vclusters), with independently configured software environments, ... Full text Cite

Anypoint: Extensible Transport Switching on the Edge

Conference 4th USENIX Symposium on Internet Technologies and Systems, USITS 2003 · January 1, 2003 Anypoint is a new model for one-to-many communication with ensemble sites—aggregations of end nodes that appear to the external Internet as a unified site. Policies for routing Anypoint traffic are defined by application-layer plugins residing in extensibl ... Cite

Scalability and Accuracy in a Large-Scale Network Emulator

Conference Operating Systems Review (ACM) · December 31, 2002 This paper presents ModelNet, a scalable Internet emulation environment that enables researchers to deploy unmodified software prototypes in a configurable Internet-like environment and subject them to faults and varying network conditions. Edge nodes runn ... Full text Cite

Back to the future: Dependable computing = dependable services

Journal Article Proceedings of the 10th Workshop on ACM SIGOPS European Workshop, EW 10 · December 1, 2002 Clients are coming to rely more and more on external services to meet the needs of their users, and the clients are increasingly simple caches of soft state - "truth" is maintained elsewhere. As a result, the user experience of dependability is better serv ... Full text Cite

Anypoint: Extensible transport switching on the edge

Journal Article Computer Communication Review · July 1, 2002 Full text Cite

Scalability and accuracy in a large-scale network emulator

Journal Article Computer Communication Review · July 1, 2002 Full text Cite

The trickle-down effect: Web caching and server request distribution

Journal Article Computer Communications · March 1, 2002 Web proxies and Content Delivery Networks (CDNs) are widely used to accelerate Web content delivery and to conserve Internet bandwidth. These caching agents are highly effective for static content, which is an important component of all Web-based services. ... Full text Cite

Interposed Request Routing for Scalable Network Storage

Journal Article ACM Transactions on Computer Systems · February 1, 2002 This paper explores interposed request routing in Slice, a new storage system architecture for high-speed networks incorporating network-attached block storage. Slice interposes a request switching filter - called a μproxy - along each client's network pat ... Full text Cite

Efficient sorting using register and caches

Journal Article ACM Journal of Experimental Algorithmics · January 1, 2002 Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line ... Full text Cite

Opus: An overlay peer utility service

Conference 2002 IEEE Open Architectures and Network Programming Proceedings, OPENARCH 2002 · January 1, 2002 Today, an increasing number of important network services, such as content distribution, replicated services, and storage systems, are deploying overlays across multiple Internet sites to deliver better performance, reliability and adaptability. Currently ... Full text Cite

Self-organizing subsets: From each according to his abilities, to each according to his needs

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2002 The key principles behind current peer-to-peer research include fully distributing service functionality among all nodes participating in the system and routing individual requests based on a small amount of locally maintained state. The goals extend much ... Full text Cite

Structure and performance of the direct access file system

Conference Proceedings of the 2002 USENIX Annual Technical Conference · January 1, 2002 © 2002 by The USENIX Association All Rights Reserved. The Direct Access File System (DAFS) is an emerging industrial standard for network-attached storage. DAFS takes advantage of new user-level network interface standards. This enables a user-level file s ... Cite

Structure and performance of the direct access file system

Conference Proceedings of the 2002 USENIX Annual Technical Conference · January 1, 2002 © 2002 by The USENIX Association All Rights Reserved. The Direct Access File System (DAFS) is an emerging industrial standard for network-attached storage. DAFS takes advantage of new user-level network interface standards. This enables a user-level file s ... Cite

Structure and performance of the direct access file system

Conference Proceedings of the 2002 USENIX Annual Technical Conference · January 1, 2002 The Direct Access File System (DAFS) is an emerging industrial standard for network-attached storage. DAFS takes advantage of new user-level network interface standards. This enables a user-level file system structure in which client-side functionality for ... Cite

Distributed computing with load-managed active storage

Conference Proceedings of the IEEE International Symposium on High Performance Distributed Computing · January 1, 2002 One approach to high-performance processing of massive data sets is to incorporate computation into storage systems. Previous work has shown that this active storage model is effective for a variety of problems. This paper explores opportunities to use act ... Full text Cite

Managing energy and server resources in hosting centers

Journal Article Operating Systems Review (ACM) · December 1, 2001 Internet hosting centers serve multiple service sites from a common hardware base. This paper presents the design and implementation of an architecture for resource management in a hosting center operating system, with an emphasis on energy as a driving re ... Full text Cite

Anypoint communication protocol

Journal Article Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS · December 1, 2001 A new transport protocol called the Anypoint Communication Protocol (ACP) is presented. ACP clients establish connections to abstract services, represented at the network edge by Anypoint intermediaries. Potential applications of the protocol include scala ... Cite

Energy management for server clusters

Journal Article Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS · December 1, 2001 Energy should be viewed as an important element of resource management for Web sites, hosting centers, and other Internet server clusters. Energy-conscious service positioning is proposed. In this scheme, the system continuously monitors load and adaptivel ... Cite

FASTSLIM: Prefetch-Safe Trace Reduction for I/O Cache Simulation

Journal Article ACM Transactions on Modeling and Computer Simulation · April 1, 2001 Trace-driven simulation is a valuable tool for evaluating I/O systems. This article presents a new algorithm, called FASTSLIM, that reduces the size of I/O traces and improves simulation performance without compromising simulation accuracy. FASTSLIM is mor ... Full text Cite

End system optimizations for high-speed TCP

Journal Article IEEE Communications Magazine · April 1, 2001 Delivered TCP performance on high-speed networks is often limited by the sending and receiving hosts, rather than by the network hardware or the TCP protocol implementation itself. In this case, systems can achieve higher bandwidth by reducing host overhea ... Full text Cite

Web caching and content distribution: A view from the interior

Journal Article Computer Communications · February 1, 2001 Research in Web caching has yielded analytical tools to model the behavior of large-scale Web caches. Recently, Wolman et al. (Proceedings of the 17th ACM Symposium on Operating Systems Principles, December 1999) have proposed an analytical model and used ... Full text Cite

Efficient sorting using registers and caches

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2001 Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line ... Full text Cite

Payload caching: High-speed data forwarding for network intermediaries

Conference Proceedings of the 2001 USENIX Annual Technical Conference · January 1, 2001 © 2001 by The USENIX Association. All Rights Reserved. Large-scale network services such as data delivery often incorporate new functions by interposing intermediaries on the network. Examples of forwarding intermediaries include firewalls, content routers ... Cite

Payload caching: High-speed data forwarding for network intermediaries

Conference Proceedings of the 2001 USENIX Annual Technical Conference · January 1, 2001 Large-scale network services such as data delivery often incorporate new functions by interposing intermediaries on the network. Examples of forwarding intermediaries include firewalls, content routers, protocol converters, caching proxies, and multicast s ... Cite

Flow computation on massive grids

Journal Article Proceedings of the ACM Workshop on Advances in Geographic Information Systems · January 1, 2001 As detailed terrain data becomes available, GIS applications target larger geographic areas at finer resolutions. Processing the massive data presents significant challenges to GIS systems and demands algorithms that are optimized for both data movement an ... Full text Cite

Server switching: Yesterday and tomorrow

Conference Proceedings - 2nd IEEE Workshop on Internet Applications, WIAPP 2001 · January 1, 2001 Server switches distribute incoming request traffic across the nodes of Internet server clusters and Web proxy cache arrays. These switches are a standard building block for large-scale Internet services, with many commercial products on the market. As Int ... Full text Cite

Interposed request routing for scalable network storage

Conference Proceedings of the 4th Conference on Symposium on Operating System Design and Implementation, OSDI 2000 · October 22, 2000 This paper explores interposed request routing in Slice, a new storage system architecture for high-speed networks incorporating network-attached block storage. Slice interposes a request switching filter | called a /iproxy | along each client's network pa ... Cite

Interposed request routing for scalable network storage

Conference 4th Symposium on Operating System Design and Implementation, OSDI 2000 · January 1, 2000 © 2000 E-flow ACM (Association for Computing Machinery).All right reserved. This paper explores interposed request routing in Slice, a new storage system architecture for high-speed networks incorporating networkattached block storage. Slice interposes a r ... Cite

Interposed request routing for scalable network storage

Conference 4th Symposium on Operating System Design and Implementation, OSDI 2000 · January 1, 2000 This paper explores interposed request routing in Slice, a new storage system architecture for high-speed networks incorporating networkattached block storage. Slice interposes a request switching filter — called a ßproxy — along each client's network path ... Cite

Failure-atomic file access in an interposed network storage system

Journal Article Proceedings of the IEEE International Symposium on High Performance Distributed Computing · January 1, 2000 Presents a recovery protocol for block I/O operations in Slice, a storage system architecture for high-speed LANs incorporating network-attached block storage. The goal of the Slice architecture is to provide a network file service with scalable bandwidth ... Full text Cite

Potentials and limitations of fault-based Markov prefetching for virtual memory pages

Journal Article Performance Evaluation Review · January 1, 1999 Fault-based Markov prefetching for virtual memory pages (VMP) is examined. Markov prediction based only on the sequence of program-issued faults are shown to achieve reasonably high levels of accuracy for some scientific applications. Using the fault seque ... Full text Cite

Case for buffer servers

Journal Article Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS · January 1, 1999 Faster networks and cheaper storage have brought us to a point where I/0 caching servers have an important role in the design of scalable, high-performance file systems. These intermediary I/O servers - or buffer servers - can be deployed at strategic poin ... Cite

Implementing cooperative prefetching and caching in a globally-managed memory system

Journal Article Performance Evaluation Review · January 1, 1998 This paper presents cooperative prefetching and caching - the use of network-wide global resources (memories, CPUs, and disks) to support prefetching and caching in the presence of hints of future demands. Cooperative prefetching and caching effectively un ... Full text Cite

Reduce, reuse, recycle: An approach to building large Internet caches

Journal Article Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS · January 1, 1997 New demands brought by the continuing growth of the Internet will be met in part by more effective use of caching in the Web and other services. We have developed CRISP, a distributed Internet object cache targeted to the needs of the organizations that ag ... Cite

Cut-through delivery in trapeze: an exercise in low-latency messaging

Conference IEEE International Symposium on High Performance Distributed Computing, Proceedings · January 1, 1997 New network technology continues to improve both the latency and bandwidth of communication in computer clusters. The fastest high-speed networks approach or exceed the I/O bus bandwidths of 'gigabit-ready' hosts. These advances introduce new consideration ... Cite

Using shared memory for read-mostly RPC services

Conference Proceedings of the Annual Hawaii International Conference on System Sciences · January 1, 1996 This paper describes object-based runtime support for eficient access to protected objects, i.e., objects belonging to server programs that export protected services to untrusted clients. Modern operating systems use hardware-based protection domains to pr ... Full text Cite

Integrating coherency and recoverability in distributed systems

Conference Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, OSDI 1994 · November 14, 1994 We propose a technique for maintaining coherency of a transactional distributed shared memory, used by applications accessing a shared persistent store. Our goal is to improve support for fine-grained distributed data sharing in collaborative design applic ... Cite

Sharing and Protection in a Single-Address-Space Operating System

Journal Article ACM Transactions on Computer Systems (TOCS) · January 11, 1994 This article explores memory sharing and protection support in Opal, a single-address-space operating system designed for wide-address 1994 architectures. Opal threads execute within protection domains in a single shared virtual address space. Sharing is s ... Full text Cite

Some issues for single address space systems

Conference Proceedings of IEEE 4th Workshop on Workstation Operating Systems, WWOS 1993 · January 1, 1993 We previously described Opal, an OS environment that has a single virtual address space common to all protection domains, rather than the usual private virtual address space per protection domain (e.g., a Unix process). All threads on an Opal node see the ... Full text Cite

Lightweight shared objects in a 64-bit operating system

Conference Conference on Object-Oriented Programming Systems, Languages and Applications · December 1, 1992 Object-oriented models are a popular basis for supporting uniform sharing of data and services in operating systems, distributed programming systems, and database systems. We term systems that use objects for these purposes object sharing systems. Operatin ... Cite

Lightweight Shared Objects in a 64-bit Operating System

Journal Article ACM SIGPLAN Notices · October 31, 1992 Object-oriented models are a popular basis for supporting uniform sharing of data and services in operating systems, distributed programming systems, and database systems. We term systems that use objects for these purposes object sharing systems. Operatin ... Full text Cite

Distribution in a single address space operating system

Conference Proceedings of the 5th ACM SIGOPS European Workshop: Models and Paradigms for Distributed Systems Structuring, EW 1992 · September 21, 1992 The recent, appearance of architectures with Hat 64-bit virtual addressing opens an opportunity to reconsider the way our operating systems use virtual address spaces. We are building an operating system called Opal for these wide-address architectures. Th ... Full text Cite

Architectural support for single address space operating systems

Conference International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS · September 1, 1992 Recent microprocessor announcements show a trend toward wide-address computers: architectures that support 64 bits of virtual address space. Such architectures facilitate fundamentally new operating system organizations that promote efficient data sharing ... Cite

Architecture Support for Single Address Space Operating Systems

Journal Article ACM SIGPLAN Notices · January 9, 1992 Full text Cite

Using virtual addresses as object references

Conference Proceedings - 2nd International Workshop on Object Orientation in Operating Systems, IWOOOS 1992 · January 1, 1992 An alternative to surrogates is to use ordinary virtual addresses for inter-object referencing. Usually (but not always) this involves mapping distributed or persistent data into specified parts of the application's address space relying on page faults to ... Full text Cite

Opal: A single address space system for 64-bit architecture

Conference 3rd Workshop on Workstation Operating Systems, WWOS 1992 · January 1, 1992 The recent appearance of architectures with flat 64-bit virtual addressing opens an opportunity to reconsider the way in which operating systems use virtual address spaces. An operating system called Opal is being built for these wide-address architectures ... Full text Cite

Dynamic node reconfiguration in a parallel-distributed environment

Conference Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP · April 1, 1991 Idle workstations in a network represent a significant computing potential. In particular, their processing power can be used by parallel-distributed programs that treat the network as a loosely-coupled multiprocessor. But the set of machines free to parti ... Full text Cite

Dynamic Node Reconfiguration in a Parallel-Distributed Environment

Journal Article ACM SIGPLAN Notices · January 4, 1991 Full text Cite

Amber system. Parallel programming on a network of multiprocessors

Journal Article Operating Systems Review (ACM) · December 1, 1989 This paper describes a programming system called Amber that permits a single application program to use a homogeneous network of computers in a uniform way, making the network appear to the application as an integrated multiprocessor. Amber is specifically ... Full text Cite