TY - JOUR
AU - 
AB - The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20) 1∗ 1∗ 1∗ †1 1 Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, 1 1 1 Amir Gholami, Michael W. Mahoney, Kurt Keutzer University of California at Berkeley, {sheng.s, zhendong, yejiayu, linjian, zheweiy, amirgh, mahoneymw, keutzer}@berkeley.edu Abstract Here we solely focus on quantization (Courbariaux, Ben- gio, and David 2015; Rastegari et al. 2016; Li, Zhang, and Liu Transformer based architectures have become de-facto mod- 2016; Zhou et al. 2016; Choi et al. 2018; Dong et al. 2019; els used for a range of Natural Language Processing tasks. Zhang et al. 2018). One of the challenges here is that ultra In particular, the BERT based models achieved signiﬁcant low precision quantization can lead to signiﬁcant accuracy accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. How- degradation. Mixed precision quantization (Wu et al. 2018; ever, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in Zhou et al. 2018; Wang et al. 2019) and multi-stage quantiza- resource constrained environments has become a challeng- tion (Zhou et al. 2017) have been proposed to solve/alleviate ing task. In this work, we perform an extensive analysis 
TI - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
JF - Proceedings of the AAAI Conference on Artificial Intelligence
DO - 10.1609/aaai.v34i05.6409
DA - 2020-04-03
UR - https://www.deepdyve.com/lp/unpaywall/q-bert-hessian-based-ultra-low-precision-quantization-of-bert-PU38nRygjD
DP - DeepDyve
ER -