Multi-armed bandits with metric switching costs

Conference Paper

In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C. This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc. In this paper we develop a general duality-based framework to provide the first O(1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3∈+∈ε. © 2009 Springer Berlin Heidelberg.

Full Text

Duke Authors

Cited Authors

  • Guha, S; Munagala, K

Published Date

  • November 12, 2009

Published In

Volume / Issue

  • 5556 LNCS / PART 2

Start / End Page

  • 496 - 507

Electronic International Standard Serial Number (EISSN)

  • 1611-3349

International Standard Serial Number (ISSN)

  • 0302-9743

International Standard Book Number 10 (ISBN-10)

  • 3642029299

International Standard Book Number 13 (ISBN-13)

  • 9783642029295

Digital Object Identifier (DOI)

  • 10.1007/978-3-642-02930-1_41

Citation Source

  • Scopus