TY - JOUR AU - AB - Contextual word representations derived from large-scale neural language models are suc- cessful across a diverse set of NLP tasks, suggesting that they encode useful and trans- ferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by sev- eral recent pretrained contextualizers (variants Figure 1: An illustration of the probing model setup of ELMo, the OpenAI transformer language used to study the linguistic knowledge within contex- model, and BERT) with a suite of sixteen di- tual word representations. verse probing tasks. We find that linear mod- els trained on top of frozen contextual repre- sentations are competitive with state-of-the-art vector is assigned to each word. Recent work has task-specific models in many cases, but fail on explored contextual word representations (hence- tasks requiring fine-grained linguistic knowl- edge (e.g., conjunct identification). To inves- forth: CWRs), which assign each word a vector tigate the transferability of contextual word that is a function of the entire input sequence; this representations, we quantify differences in the enables them to model the use of words in context. transferability of individual layers within con- CWRs are typically the outputs of a neural net- textualizers, especially between recurrent TI - Linguistic Knowledge and Transferability of Contextual Representations JF - Proceedings of the 2019 Conference of the North DO - 10.18653/v1/n19-1112 DA - 2019-01-01 UR - https://www.deepdyve.com/lp/unpaywall/linguistic-knowledge-and-transferability-of-contextual-representations-5PPOFusida DP - DeepDyve ER -