Cheating the I/O bottleneck: Network storage with Trapeze/Myrinet
© USENIX 1998 Annual Technical Conference. all rights reserved. Recent advances in I/O bus structures (e.g., PCI), highspeed networks, and fast, cheap disks have significantly expanded the I/O capacity of desktop-class systems. This paper describes a messaging system designed to deliver the potential of these advances for network storage systems including cluster file systems and network memory. We describe gmsnet, an RPC-like kernel-kernel messaging system based on Trapeze, a new firmware program for Myrinet network interfaces. We show how the communication features of Trapeze and gms-net are used by the Global Memory Service (GMS), a kernel-based network memory system. The paper focuses on support for zero-copy page migration in GMS/Trapeze using two RPC variants important for peer-peer distributed services: (1) delegated RPC in which a request is delegated to a third party, and (2) nonblocking RPC in which replies are processed from the Trapeze receive interrupt handler. We present measurements of sequential file access from network memory in the GMS/Trapeze prototype on a Myrinet/Alpha cluster, showing the bandwidth effects of file system interfaces and communication choices. GMS/Trapeze delivers a peak read bandwidth of 96 MB/s using memory-mapped file I/O.