Skip to main content

Correlating instrumentation data to system states: A building block for automated diagnosis and control

Publication ,  Conference
Cohen, I; Goldszmidt, M; Kelly, T; Symons, J; Chase, JS
Published in: OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation
January 1, 2004

This paper studies the use of statistical induction techniques as a basis for automated performance diagnosis and performance management. The goal of the work is to develop and evaluate tools for offline and online analysis of system metrics gathered from instrumentation in Internet server platforms. We use a promising class of probabilistic models (Tree-Augmented Bayesian Networks or TANs) to identify combinations of system-level metrics and threshold values that correlate with high-level performance states—compliance with Service Level Objectives (SLOs) for average-case response time—in a three-tier Web service under a variety of conditions. Experimental results from a testbed show that TAN models involving small subsets of metrics capture patterns of performance behavior in a way that is accurate and yields insights into the causes of observed performance effects. TANs are extremely efficient to represent and evaluate, and they have interpretability properties that make them excellent candidates for automated diagnosis and control. We explore the use of TAN models for offline forensic diagnosis, and in a limited online setting for performance forecasting with stable workloads.

Duke Scholars

Published In

OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation

Publication Date

January 1, 2004

Start / End Page

231 / 244
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., & Chase, J. S. (2004). Correlating instrumentation data to system states: A building block for automated diagnosis and control. In OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation (pp. 231–244).
Cohen, I., M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase. “Correlating instrumentation data to system states: A building block for automated diagnosis and control.” In OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation, 231–44, 2004.
Cohen I, Goldszmidt M, Kelly T, Symons J, Chase JS. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In: OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation. 2004. p. 231–44.
Cohen, I., et al. “Correlating instrumentation data to system states: A building block for automated diagnosis and control.” OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation, 2004, pp. 231–44.
Cohen I, Goldszmidt M, Kelly T, Symons J, Chase JS. Correlating instrumentation data to system states: A building block for automated diagnosis and control. OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation. 2004. p. 231–244.

Published In

OSDI 2004 - 6th Symposium on Operating Systems Design and Implementation

Publication Date

January 1, 2004

Start / End Page

231 / 244