TY - JOUR AU - AB - Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov and Michael Collins Google Inc New York, NY {andor,chrisalberti,djweiss,severyn,apresta,kuzman,slav,mjcollins}@google.com Abstract potheses and introduce global normalization with a conditional random field (CRF) objective (Bot- We introduce a globally normalized tou et al., 1997; Le Cun et al., 1998; Lafferty et transition-based neural network model al., 2001; Collobert et al., 2011) to overcome the that achieves state-of-the-art part-of- label bias problem that locally normalized mod- speech tagging, dependency parsing and els suffer from. Since we use beam inference, sentence compression results. Our model we approximate the partition function by summing is a simple feed-forward neural network over the elements in the beam, and use early up- that operates on a task-specific transition dates (Collins and Roark, 2004; Zhou et al., 2015). system, yet achieves comparable or better We compute gradients based on this approximate accuracies than recurrent models. We dis- global normalization and perform full backprop- cuss the importance of global as opposed agation training of all neural network parameters to local normalization: a key insight is based on the CRF loss. that the label bias problem implies that In Section 3 we revisit the label bias TI - Globally Normalized Transition-Based Neural Networks JF - Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) DO - 10.18653/v1/p16-1231 DA - 2016-01-01 UR - https://www.deepdyve.com/lp/unpaywall/globally-normalized-transition-based-neural-networks-fw9GEmNSmw DP - DeepDyve ER -