TY - JOUR AU - AB - The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 1∗ 1∗ 1∗ †1 1 Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, 1 1 1 Amir Gholami, Michael W. Mahoney, Kurt Keutzer University of California at Berkeley, {sheng.s, zhendong, yejiayu, linjian, zheweiy, amirgh, mahoneymw, keutzer}@berkeley.edu Abstract Here we solely focus on quantization (Courbariaux, Ben- gio, and David 2015; Rastegari et al. 2016; Li, Zhang, and Liu Transformer based architectures have become de-facto mod- 2016; Zhou et al. 2016; Choi et al. 2018; Dong et al. 2019; els used for a range of Natural Language Processing tasks. Zhang et al. 2018). One of the challenges here is that ultra In particular, the BERT based models achieved significant low precision quantization can lead to significant accuracy accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. How- degradation. Mixed precision quantization (Wu et al. 2018; ever, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in Zhou et al. 2018; Wang et al. 2019) and multi-stage quantiza- resource constrained environments has become a challeng- tion (Zhou et al. 2017) have been proposed to solve/alleviate ing task. In this work, we perform an extensive analysis TI - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT JF - Proceedings of the AAAI Conference on Artificial Intelligence DO - 10.1609/aaai.v34i05.6409 DA - 2020-04-03 UR - https://www.deepdyve.com/lp/unpaywall/q-bert-hessian-based-ultra-low-precision-quantization-of-bert-PU38nRygjD DP - DeepDyve ER -