TY - JOUR
AU - 
AB - Mask-Predict: Parallel Decoding of Conditional Masked Language Models Marjan Ghazvininejad Omer Levy Yinhan Liu Luke Zettlemoyer Facebook AI Research Seattle, WA Abstract entire sequence (left and right context) to pre- dict each masked word. We train with a simple Most machine translation systems generate masking scheme where the number of masked tar- text autoregressively from left to right. We, get tokens is distributed uniformly, presenting the instead, use a masked language modeling ob- model with both easy (single mask) and difﬁcult jective to train a model to predict any subset (completely masked) examples. Unlike recently of the target words, conditioned on both the proposed insertion models (Gu et al., 2019; Stern input text and a partially masked target trans- lation. This approach allows for efﬁcient it- et al., 2019), which treat each token as a separate erative decoding, where we ﬁrst predict all training instance, CMLMs can train from the en- of the target words non-autoregressively, and tire sequence in parallel, resulting in much faster then repeatedly mask out and regenerate the training. subset of words that the model is least conﬁ- We also introduce a new decoding algorithm, dent about. By applying this strategy for a con- mask-predict, 
TI - Mask-Predict: Parallel Decoding of Conditional Masked Language Models
JF - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
DO - 10.18653/v1/d19-1633
DA - 2019-01-01
UR - https://www.deepdyve.com/lp/unpaywall/mask-predict-parallel-decoding-of-conditional-masked-language-models-90mkZBjvyr
DP - DeepDyve
ER -