PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation
To explore the limit of dialogue generation pretraining, we present the models of PLATO-XLwith up to 11 billion parameters, trained onboth Chinese and English social media conversations. To train such large models, weadopt the architecture of unified transformerwith high computation and parameter efficiency.In addition, we carry out multi-party aware pretraining to better distinguish the characteristic information in social media conversations.With such designs, PLATO-XL successfullyachieves superior performances as comparedto other approaches in both Chinese and English chitchat. We further explore the capacityof PLATO-XL on other conversational tasks,such as knowledge grounded dialogue and taskoriented conversation. The experimental resultsindicate that PLATO-XL obtains state-of-theart results across multiple conversational tasks,verifying its potential as a foundation model ofconversational AI.