Output performance study on a production petascale filesystem

Published

Conference Paper

© Springer International Publishing AG 2017. This paper reports our observations from a top-tier supercomputer Titan and its Lustre parallel file stores under production load. In summary, we find that supercomputer file systems are highly variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write-sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O middleware system—distributes the I/O load so that each client writes separate files on multiple targets, and each target stores files for multiple clients, in a balanced way. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good spots” in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe regular diurnal load patterns.

Full Text

Duke Authors

Cited Authors

  • Xie, B; Chase, JS; Dillow, D; Klasky, S; Lofstead, J; Oral, S; Podhorszki, N

Published Date

  • January 1, 2017

Published In

Volume / Issue

  • 10524 LNCS /

Start / End Page

  • 187 - 200

Electronic International Standard Serial Number (EISSN)

  • 1611-3349

International Standard Serial Number (ISSN)

  • 0302-9743

International Standard Book Number 13 (ISBN-13)

  • 9783319676296

Digital Object Identifier (DOI)

  • 10.1007/978-3-319-67630-2_16

Citation Source

  • Scopus