TY - JOUR
AU - 
AB - 1 1 1 2 1 Yen-Chun Chen , Zhe Gan , Yu Cheng , Jingzhou Liu , Jingjing Liu 1 2 Microsoft Dynamics 365 AI Research Carnegie Mellon University {yen-chun.chen,zhe.gan,yu.cheng,jinjl}@microsoft.com; liujingzhou@cs.cmu.edu Abstract However, beyond common practice of ﬁnetun- ing BERT for language understanding (Wang et al., Large-scale pre-trained language model such 2019), applying BERT to language generation still as BERT has achieved great success in lan- remains an open question. Text generation aims guage understanding tasks. However, it re- to generate natural language sentences conditioned mains an open question how to utilize BERT on certain input, with applications ranging from for language generation. In this paper, we machine translation (Cho et al., 2014; Sutskever present a novel approach, Conditional Masked et al., 2014; Bahdanau et al., 2015), text sum- Language Modeling (C-MLM), to enable the ﬁnetuning of BERT on target generation tasks. marization (Nallapati et al., 2016; Gehring et al., The ﬁnetuned BERT (teacher) is exploited 2017; Chen and Bansal, 2018), to image caption- as extra supervision to improve conventional ing (Vinyals et al., 2015; Xu et al., 2015; Gan et al., Seq2Seq models (student) for better text gen- 2017). In this work, we study how to use BERT 
TI - Distilling Knowledge Learned in BERT for Text Generation
JF - Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
DO - 10.18653/v1/2020.acl-main.705
DA - 2020-01-01
UR - https://www.deepdyve.com/lp/unpaywall/distilling-knowledge-learned-in-bert-for-text-generation-UL0t0BSGop
DP - DeepDyve
ER -