ConferenceACM SIGCOMM 2024 - Proceedings of the 2024 ACM SIGCOMM 2024 Conference · August 4, 2024
Performance of collective communication is critical for distributed systems. Using libraries to implement collective communication algorithms is not a good fit for a multi-tenant cloud environment because the tenant is not aware of the underlying physical ...
Full textCite
ConferenceEuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems · April 22, 2024
Kernel task scheduling is important for application performance, adaptability to new hardware, and complex user requirements. However, developing, testing, and debugging new scheduling algorithms in Linux, the most widely used cloud operating system, is sl ...
Full textCite
ConferenceProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024 · January 1, 2024
Performance isolation is essential for sharing resources in multi-tenant public clouds. Compared with traditional kernel-based networking, RDMA presents unique challenges especially because RDMA NIC’s complex microarchitecture resources are often hidden fr ...
Cite
ConferenceProceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024 · January 1, 2024
High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rat ...
Cite
ConferenceProceedings - International Symposium on Computer Architecture · January 1, 2024
A typical SmartNIC (SNIC) integrates a processor comprising Arm CPU and accelerators with a conventional NIC. The processor is designed to energy-efficiently execute network functions frequently used by datacenter applications. With such a processor, the S ...
Full textCite
ConferenceHotNets 2023 - Proceedings of the 22nd ACM Workshop on Hot Topics in Networks · November 28, 2023
With the rise of microservices, the execution environment of many cloud applications has become a set of virtual machines or containers connected by a flexible and feature-rich virtual network. We argue that the implementation of such virtual networks shou ...
Full textCite
ConferenceSoCC 2023 - Proceedings of the 2023 ACM Symposium on Cloud Computing · October 30, 2023
Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect microservices of a distributed application. However, because of how they interpose on application traffic, they can substantially increa ...
Full textCite
ConferenceHotOS 2023 - Proceedings of the 19th Workshop on Hot Topics in Operating Systems · June 22, 2023
Intra-host networks, including heterogeneous devices and interconnect fabrics, have become increasingly complex and crucial. However, intra-host networks today do not provide sufficient manageability. This prevents data center operators from running a reli ...
Full textCite
Journal ArticleIEEE Micro · January 1, 2023
Remote direct memory access (RDMA) networks enable low latency and low central processing unit utilization, and their widespread adoption in datacenters enables improved application performance. However, there are performance isolation concerns for RDMA de ...
Full textCite
ConferenceProceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023 · January 1, 2023
Recent years have witnessed the wide adoption of RDMA in the cloud to accelerate first-party workloads and achieve cost savings by freeing up CPU cycles. Now cloud providers are working towards supporting RDMA in general-purpose guest VMs to benefit third- ...
Cite
ConferenceProceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023 · January 1, 2023
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal arguments into messa ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2023
Online matrix vector multiplication is a fundamental step and bottleneck in many machine learning algorithms. It is defined as follows: given a matrix at the pre-processing phase, at each iteration one receives a query vector and needs to form the matrix-v ...
Cite
Conference32nd USENIX Security Symposium, USENIX Security 2023 · January 1, 2023
Hypervisors have played a critical role in cloud security, but they introduce a large trusted computing base (TCB) and incur a heavy performance tax. As of late, hypervisor offloading has become an emerging trend, where privileged functions are sunk into s ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2023
Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-the-art deep neural networks are becoming larger in size every year to deliver increasing model ...
Cite
Journal ArticleProceedings of the VLDB Endowment · November 1, 2022
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneo ...
Full textCite
ConferenceProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 · June 30, 2022
Many deep learning tasks have to deal with graphs (e.g., protein structures, social networks, source code abstract syntax trees). Due to the importance of these tasks, people turned to Graph Neural Networks (GNNs) as the de facto method for learning on gra ...
Full textCite
ConferenceProceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022 · January 1, 2022
A cloud provider today provides its network resources to its tenants as a black box, such that cloud tenants have little knowledge of the underlying network characteristics. Meanwhile, data-intensive applications have increasingly migrated to the cloud, an ...
Cite
ConferenceProceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022 · January 1, 2022
High-speed RDMA networks are getting rapidly adopted in the industry for their low latency and reduced CPU overheads. To verify that RDMA can be used in production, system administrators need to understand the set of application workloads that can potentia ...
Cite
ConferenceProceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 · January 1, 2022
Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization ...
Cite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2022
There has been a recent effort in applying differential privacy on memory access patterns to enhance data privacy. This is called differential obliviousness. Differential obliviousness is a promising direction because it provides a principled trade-off bet ...
Full textCite
ConferenceProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 · January 1, 2022
In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation. Given a data-set X ⊂ ℝd, a binary function f : ℝd × ℝd → ℝ, and a point y ∈ ℝd, the Pairwise Summation Estimate PSE ...
Full textCite
ConferenceSIGCOMM 2021 - Proceedings of the ACM SIGCOMM 2021 Conference · August 9, 2021
Task-based distributed frameworks (e.g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving. As ...
Full textCite
ConferenceLeibniz International Proceedings in Informatics, LIPIcs · July 1, 2021
Numerous high-profile works have shown that access patterns to even encrypted databases can leak secret information and sometimes even lead to reconstruction of the entire database. To thwart access pattern leakage, the literature has focused on oblivious ...
Full textCite
ConferenceHotOS 2021 - Proceedings of the 2021 Workshop on Hot Topics in Operating Systems · June 1, 2021
Linux has become the de-facto operating system of our age, but its vulnerabilities are a constant threat to service availability, user privacy, and data integrity. While one might scrap Linux and start over, the cost of that would be prohibitive due to Lin ...
Full textCite
ConferenceProceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021 · January 1, 2021
High development velocity is critical for modern systems. This is especially true for Linux file systems which are seeing increased pressure from new storage devices and new demands on storage systems. However, high velocity Linux kernel development is cha ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2021
Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single ...
Cite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2021
Low latency is increasingly critical for modern workloads, to the extent that compute functions are explicitly scheduled to be co-located with their in-memory object stores for faster access. However, the traditional object store architecture mandates that ...
Full textCite
ConferenceICLR 2021 - 9th International Conference on Learning Representations · January 1, 2021
In this work, we examine the security of InstaHide, a scheme recently proposed by Huang et al. (2020b) for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the d ...
Cite
ConferenceSIGCOMM 2020 - Proceedings of the 2020 Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication · July 30, 2020
Researchers have shown that offloading software middleboxes (e.g., NAT, firewall, load balancer) to programmable switches can yield orders-of-magnitude performance gains. However, it requires manually selecting the middle-box components to offload and rewr ...
Full textCite
ConferenceProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 · January 1, 2020
Building a formally-verified software middlebox is attractive for network reliability. In this paper, we explore the feasibility of verifying “almost unmodified” software middleboxes. Our key observation is that software middleboxes are already designed an ...
Cite
ConferenceProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020 · January 1, 2020
High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging. Currently, deep learning ...
Cite