Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation
Conversational recommendation system (CRS) is able to obtain fine-grained and dynamic user preferences based on interactive dialogue. Previous CRS assumes that the user has a clear target item, which often deviates from the real scenario, that is for many users who resort to CRS, they might not have a clear idea about what they really like. Specifically, the user may have a clear single preference for some attribute types (e.g. brand) of items, while for other attribute types (e.g. color), the user may have multiple preferences or even no clear preferences, which leads to multiple acceptable attribute instances (e.g. black and red) of one attribute type. Therefore, the users could show their preferences over items under multiple combinations of attribute instances rather than a single item with unique combination of all attribute instances. As a result, we first propose a more realistic conversational recommendation learning setting, namely Multi-Interest Multi-round Conversational Recommendation (MIMCR), where users may have multiple interests in attribute instance combinations and accept multiple items with partially overlapped combinations of attribute instances. To effectively cope with the new CRS learning setting, in this paper, we propose a novel learning framework, namely Multiple Choice questions based Multi-Interest Policy Learning (MCMIPL). In order to obtain user preferences more efficiently, the agent generates multiple choice questions rather than binary yes/no ones on specific attribute instance. Furthermore, we propose a union set strategy to select candidate items instead of existing intersection set strategy in order to overcome over-filtering items during the conversation. Finally, we design a Multi-Interest Policy Learning (MIPL) module, which utilizes captured multiple interests of the user to decide next action, either asking attribute instances or recommending items. Extensive experimental results on four datasets demonstrate the superiority of our method for the proposed MIMCR setting.