Cross-Institutional Evaluation of SuperAlarm Algorithm for Predicting In-Hospital Code Blue Events
Bedside patient monitors are essential tools in acute care settings that provide timely information about patients’ physiologic condition. However, current patient monitors are known to produce excessive alarms with majority of them being either false or non-actionable, leading to the well-known alarm fatigue issue. This creates an unsafe environment for patients as true alarms of advent events might be overlooked causing delayed or even missed opportunity for intervention. To tackle this issue, our previous studies proposed “SuperAlarm” algorithm that utilizes data mining and machine learning to find frequent combinations of monitor alarms that are predictive of in-hospital code blue (CB) events (Bai et al., 2015; 2017). The algorithm demonstrates high sensitivity in detecting code blue events while significantly reducing alarm frequency in an independent test set from the same healthcare institution. The present pilot study is our first attempt to evaluate the generalizability of SuperAlarm in different institutions, aiming to shed light on the implementation strategy of SuperAlarm algorithm.
Alarm data from bedside patient monitors located in critical care units were obtained from two healthcare institutions. Alarm data from institution A (InsA) include 412 code blue and 4020 matched control encounters (by time, diagnosis related group, age, gender and medical unit but without code blue calls) between 2013 and 2018. Alarm data from institution B (InsB) include 254 code blue and 2213 matched control encounters between 2010 and 2012. InsA data were divided into training and test set at 80%-20% split. The training set was used to mine SuperAlarm algorithm following the same framework as our previous studies (Bai et al., 2015; 2017). In brief, the Maximal Frequent Itemset (MAFIA) algorithm was adopted to find frequent combination of alarms (SuperAlarm patterns) preceding code blue events while seldomly occur in control encounters. The weighted average occurrence representation (WAOR) was used to integrate cumulative effect of SuperAlarm patterns over time, which then served as features to train a classifier (logistic regression with lasso regularization) to predict code blue events. All hyperparameters were determined through cross validation during the training process. Three metrics were calculated to evaluate SuperAlarm performance including sensitivity along lead time before code blue (Sen@L), alarm frequency reduction ratio (AFRR) and work-up to detection ratio (WDR). The derived algorithm was first tested on InsA test set to evaluate the internal institute performance, followed by external institute performance using all data from InsB as test set. Cross-institute performance evaluation was achieved by randomly selecting same number of encounters in InsB as in InsA test set 100 times via bootstrapping and statistically comparing the performance from both institutions.
The final set of SuperAlarm patterns include in total 798 combinations with cardinality (number of alarms in a pattern) ranging from 2 to 7. Fig. 1A shows exemplar SuperAlarm patterns randomly selected from each cardinality. Fig. 1B presents differences in Sen@L between internal- (black circle) and external- (red circle) institutional test set, and bootstrap-based performance (red cross: mean, shaded areas: 95% CI). Performance on InsA and InsB test sets show significant difference in AFRR (0.931±0.161 vs 0.950±0.085, p<0.05) and WDR (7.064±0.215 vs 8.614±0.146, p<0.05). Discussion
SuperAlarm patterns mined from InsA share similar arrangement as previous studies, such as coupling of arrythmia alarms with hemodynamic related alarms and combinations of parametric alarms that capture the trending information. Testing performance from both institutions shows large reduction in alarm frequency (over 90%) while maintaining above 60% sensitivity within one-hour lead time. It’s worth noting lab test data that are found to improve sensitivity were not included due to the focus of the study (Bai et al., 2015). SuperAlarm presents significantly different performance from internal and external institution test sets as shown in Fig.1B. One plausible reason is the disparity in distribution of CB subtypes SuperAlarm between two institutions. Investigation of drill-down performance based on CB subtypes and clinical review of case-by-case acuteness will be our next steps to provide further insight to guide the implementation strategy of SuperAlarm.
Machine Learning for Healthcare
Conference Start Date