TY - JOUR AU - AB -  | |   1,2, 2, 2, 1, 1, Zhe Zhao Hui Chen Jinbin Zhang Xin Zhao Tao Liu   | |   1, 3, 2, 2, , 1, Wei Lu Xi Chen Haotang Deng Qi Ju Xiaoyong Du School of Information and DEKE, MOE, Renmin University of China, Beijing, China Tencent AI Lab School of Electronics Engineering and Computer Science, Peking University, Beijing, China fhelloworld, zhaoxinruc, tliu, lu-wei, duyongg@ruc.edu.cn fchenhuichen, westonzhang, haotangdeng, damonjug@tencent.com fmrcxg@pku.edu.cn Abstract Transformer (a structure based on attentionNN) is shown to be a more powerful feature extractor Existing works, including ELMO and BERT, compared with other encoders (Vaswani et al., have revealed the importance of pre-training 2017). for NLP tasks. While there does not exist a 2) Pre-training target (objective). single pre-training model that works best in Using proper target is one of the keys to the suc- all cases, it is of necessity to develop a frame- work that is able to deploy various pre-training cess of pre-training. While the language model models efficiently. For this purpose, we is most commonly used (Radford et al., 2018), propose an assemble-on-demand pre-training many works focus on seeking better targets such as toolkit, namely TI - UER: An Open-Source Toolkit for Pre-training Models JF - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations DO - 10.18653/v1/d19-3041 DA - 2019-01-01 UR - https://www.deepdyve.com/lp/unpaywall/uer-an-open-source-toolkit-for-pre-training-models-FDe2Y4gY4t DP - DeepDyve ER -