Skip to main content

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

Publication ,  Conference
Shen, Y; Dunn, J; Zavlanos, MM
Published in: Proceedings of the IEEE Conference on Decision and Control
January 1, 2022

In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data. To achieve this, we first formulate a mixed-integer linear program that uses the expert data to obtain causal bounds on the Conditional Value at Risk (CVaR) of the true return for all possible UCs. We then transfer these causal bounds to the learner by formulating a causal bound constrained Upper Confidence Bound (UCB) algorithm to reduce the variance of online exploration and, as a result, identify the true minimum-risk arm faster, with fewer new samples. We provide a regret analysis of our proposed method and show that it can achieve zero or constant regret. Finally, we use an emotion regulation in mobile health example to show that our proposed method outperforms risk-averse MAB methods without causal bounds.

Duke Scholars

Published In

Proceedings of the IEEE Conference on Decision and Control

DOI

EISSN

2576-2370

ISSN

0743-1546

Publication Date

January 1, 2022

Volume

2022-December

Start / End Page

144 / 149
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Shen, Y., Dunn, J., & Zavlanos, M. M. (2022). Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health. In Proceedings of the IEEE Conference on Decision and Control (Vol. 2022-December, pp. 144–149). https://doi.org/10.1109/CDC51059.2022.9992917
Shen, Y., J. Dunn, and M. M. Zavlanos. “Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health.” In Proceedings of the IEEE Conference on Decision and Control, 2022-December:144–49, 2022. https://doi.org/10.1109/CDC51059.2022.9992917.
Shen Y, Dunn J, Zavlanos MM. Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health. In: Proceedings of the IEEE Conference on Decision and Control. 2022. p. 144–9.
Shen, Y., et al. “Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health.” Proceedings of the IEEE Conference on Decision and Control, vol. 2022-December, 2022, pp. 144–49. Scopus, doi:10.1109/CDC51059.2022.9992917.
Shen Y, Dunn J, Zavlanos MM. Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health. Proceedings of the IEEE Conference on Decision and Control. 2022. p. 144–149.

Published In

Proceedings of the IEEE Conference on Decision and Control

DOI

EISSN

2576-2370

ISSN

0743-1546

Publication Date

January 1, 2022

Volume

2022-December

Start / End Page

144 / 149