Scholars@Duke publication: Neural Contextual Bandits with Deep Representation and Shallow Exploration

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Publication , Conference

Xu, P; Wen, Z; Zhao, H; Gu, Q

We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves O(√T) finite-time regret, where is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Integrative Geno ...

Conference Name

International Conference on Learning Representations

Citation

APA

Chicago

ICMJE

MLA

NLM

Xu, P., Wen, Z., Zhao, H., & Gu, Q. (n.d.). Neural Contextual Bandits with Deep Representation and Shallow Exploration. Presented at the International Conference on Learning Representations.

Conference Name

International Conference on Learning Representations