Skip to main content

Distance minimization for reward learning from scored trajectories

Publication ,  Conference
Burchfiel, B; Tomasi, C; Parr, R
Published in: 30th AAAI Conference on Artificial Intelligence, AAAI 2016
January 1, 2016

Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert's role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.

Duke Scholars

Published In

30th AAAI Conference on Artificial Intelligence, AAAI 2016

ISBN

9781577357605

Publication Date

January 1, 2016

Start / End Page

3330 / 3336
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Burchfiel, B., Tomasi, C., & Parr, R. (2016). Distance minimization for reward learning from scored trajectories. In 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (pp. 3330–3336).
Burchfiel, B., C. Tomasi, and R. Parr. “Distance minimization for reward learning from scored trajectories.” In 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 3330–36, 2016.
Burchfiel B, Tomasi C, Parr R. Distance minimization for reward learning from scored trajectories. In: 30th AAAI Conference on Artificial Intelligence, AAAI 2016. 2016. p. 3330–6.
Burchfiel, B., et al. “Distance minimization for reward learning from scored trajectories.” 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 2016, pp. 3330–36.
Burchfiel B, Tomasi C, Parr R. Distance minimization for reward learning from scored trajectories. 30th AAAI Conference on Artificial Intelligence, AAAI 2016. 2016. p. 3330–3336.

Published In

30th AAAI Conference on Artificial Intelligence, AAAI 2016

ISBN

9781577357605

Publication Date

January 1, 2016

Start / End Page

3330 / 3336