Skip to main content

Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning

Publication ,  Conference
Liu, Z; Zhang, Z; Li, X; Wu, X; Xue, M
Published in: Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025
October 26, 2025

Preference-based reinforcement learning (PBRL) algorithms train intelligent agents efficiently by learning reward functions from human preferences, bypassing the need for costly pre-existing reward functions. However, prior PBRL research has predominantly relied on simulated teachers to simulate human preferences, overlooking the absence of simulated teachers in unresolved real-world problems. To effectively apply PBRL to real-world problems, it is essential to investigate the distinctions between human teachers and simulated teachers in terms of the preference selection patterns and the behaviors exhibited by the agents. Therefore, we propose HPBRL, a novel Human Preference-Based Reinforcement Learning Collaboration prototype, in which the agent learns a flexible reward function from real human preferences. To facilitate a comprehensive comparison between human teachers and simulated teachers, we conduct an in-depth analysis through a between-subjects study involving 18 users.

Duke Scholars

Published In

Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025

DOI

Publication Date

October 26, 2025

Start / End Page

45 / 53
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Liu, Z., Zhang, Z., Li, X., Wu, X., & Xue, M. (2025). Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning. In Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025 (pp. 45–53). https://doi.org/10.1145/3746278.3759381
Liu, Z., Z. Zhang, X. Li, X. Wu, and M. Xue. “Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning.” In Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with Mm 2025, 45–53, 2025. https://doi.org/10.1145/3746278.3759381.
Liu Z, Zhang Z, Li X, Wu X, Xue M. Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning. In: Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025. 2025. p. 45–53.
Liu, Z., et al. “Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning.” Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with Mm 2025, 2025, pp. 45–53. Scopus, doi:10.1145/3746278.3759381.
Liu Z, Zhang Z, Li X, Wu X, Xue M. Decomposition and Foresight: Comparing Human and Simulated Teacher in Preference-Based Reinforcement Learning. Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025. 2025. p. 45–53.

Published In

Mcge 2025 Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation New Methods and Practice Co Located with mm 2025

DOI

Publication Date

October 26, 2025

Start / End Page

45 / 53