Scholars@Duke publication: Information relaxation bounds for infinite horizon markov decision processes

Information relaxation bounds for infinite horizon markov decision processes

Publication , Journal Article

Brown, DB; Haugh, MB

Published in: Operations Research

September 1, 2017

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown et al. [Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, Part 1):785-801]. This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs and consider applying information relaxations to reformulations of the DP. These reformulations use different state transition functions and correct for the change in state transition probabilities by multiplying by likelihood ratio factors. These reformulations can greatly simplify solutions of the information relaxations, both in leading to finite horizon subproblems and by reducing the number of states that need to be considered in these subproblems.We show that any reformulation leads to a lower bound on the optimal cost of the DP when used with an information relaxation and a penalty built from a broad class of approximate value functions. We refer to this class of approximate value functions as subsolutions, and this includes approximate value functions based on Lagrangian relaxations as well as those based on approximate linear programs. We show that the information relaxation approach, in theory, recovers a tight lower bound using any reformulation and is guaranteed to improve on the lower bounds from subsolutions. Finally, we apply information relaxations to an inventory control application with an autoregressive demand process, as well as dynamic service allocation in a multiclass queue. In our examples, we find that the information relaxation lower bounds are easy to calculate and are very close to the expected cost using simple heuristic policies, thereby showing that these heuristic policies are nearly optimal.

Duke Scholars

Author David B. Brown Fuqua School of Business

Published In

Operations Research

DOI

10.1287/opre.2017.1631

EISSN

1526-5463

ISSN

0030-364X

Publication Date

September 1, 2017

Volume

Issue

Start / End Page

1355 / 1379

Related Subject Headings

Operations Research
3507 Strategy, management and organisational behaviour
1503 Business and Management
0802 Computation Theory and Mathematics
0102 Applied Mathematics

Citation

APA

Chicago

ICMJE

MLA

NLM

Brown, D. B., & Haugh, M. B. (2017). Information relaxation bounds for infinite horizon markov decision processes. Operations Research, 65(5), 1355–1379. https://doi.org/10.1287/opre.2017.1631

Brown, D. B., and M. B. Haugh. “Information relaxation bounds for infinite horizon markov decision processes.” Operations Research 65, no. 5 (September 1, 2017): 1355–79. https://doi.org/10.1287/opre.2017.1631.

Brown DB, Haugh MB. Information relaxation bounds for infinite horizon markov decision processes. Operations Research. 2017 Sep 1;65(5):1355–79.

Brown, D. B., and M. B. Haugh. “Information relaxation bounds for infinite horizon markov decision processes.” Operations Research, vol. 65, no. 5, Sept. 2017, pp. 1355–79. Scopus, doi:10.1287/opre.2017.1631.

Brown DB, Haugh MB. Information relaxation bounds for infinite horizon markov decision processes. Operations Research. 2017 Sep 1;65(5):1355–1379.

Published In

Operations Research

DOI

10.1287/opre.2017.1631

EISSN

1526-5463

ISSN

0030-364X

Publication Date

September 1, 2017

Volume

Issue

Start / End Page

1355 / 1379

Related Subject Headings

Operations Research
3507 Strategy, management and organisational behaviour
1503 Business and Management
0802 Computation Theory and Mathematics
0102 Applied Mathematics