Offline Policy Evaluation for Learning-based Deep Brain Stimulation Controllers
Deep brain stimulation (DBS) is an effective procedure to treat motor symptoms caused by nervous system disorders such as Parkinson's disease (PD). Although existing implantable DBS devices can suppress PD symptoms by delivering fixed periodic stimuli to the Basal Ganglia (BG) region of the brain, they are considered inefficient in terms of energy and could cause side-effects. Recently, reinforcement learning (RL)-based DBS controllers have been developed to achieve both stimulation efficacy and energy efficiency, by adapting stimulation parameters (e.g., pattern and frequency of stimulation pulses) to the changes in neuronal activity. However, RL methods usually provide limited safety and performance guarantees, and directly deploying them on patients may be hindered due to clinical regulations. Thus, in this work, we introduce a model-based offline policy evaluation (OPE) methodology to estimate the performance of RL policies using historical data. As a first step, the BG region of the brain is modeled as a Markov decision process (MDP). Then, a deep latent MDP (DL-MDP) model is learned using variational inference and previously collected control trajectories. The performance of RL controllers is then evaluated on the DL-MDP models instead of patients directly, ensuring safety of the evaluation process. Further, we show that our method can be integrated into offline RL frameworks, improving control performance when limited training data are available. We illustrate the use of our methodology on a computational Basal Ganglia model (BGM); we show that it accurately estimates the expected returns of controllers trained following state-of-the-art RL frameworks, outperforming existing OPE methods designed for general applications.