Scholars@Duke publication: Distance minimization for reward learning from scored trajectories

Distance minimization for reward learning from scored trajectories

Publication , Conference

Burchfiel, B; Tomasi, C; Parr, R

Published in: 30th AAAI Conference on Artificial Intelligence, AAAI 2016

January 1, 2016

Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert's role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.

Duke Scholars

Author Carlo Tomasi Computer Science

Published In

30th AAAI Conference on Artificial Intelligence, AAAI 2016

ISBN

9781577357605

Publication Date

January 1, 2016

Start / End Page

3330 / 3336

Citation

APA

Chicago

ICMJE

MLA

NLM

Burchfiel, B., Tomasi, C., & Parr, R. (2016). Distance minimization for reward learning from scored trajectories. In 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (pp. 3330–3336).

Burchfiel, B., C. Tomasi, and R. Parr. “Distance minimization for reward learning from scored trajectories.” In 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 3330–36, 2016.

Burchfiel B, Tomasi C, Parr R. Distance minimization for reward learning from scored trajectories. In: 30th AAAI Conference on Artificial Intelligence, AAAI 2016. 2016. p. 3330–6.

Burchfiel, B., et al. “Distance minimization for reward learning from scored trajectories.” 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 2016, pp. 3330–36.

Burchfiel B, Tomasi C, Parr R. Distance minimization for reward learning from scored trajectories. 30th AAAI Conference on Artificial Intelligence, AAAI 2016. 2016. p. 3330–3336.

Published In

30th AAAI Conference on Artificial Intelligence, AAAI 2016

ISBN

9781577357605

Publication Date

January 1, 2016

Start / End Page

3330 / 3336