TY - JOUR AU - Liu, Yuanliu AB - Video-Based Emotion Recognition using CNN-RNN and C3D Hybrid Networks Yin Fan, Xiangju Lu, Dian Li, Yuanliu Liu iQIYI Co. Ltd, Beijing, 10080, China {fanyin, luxiangju, lidian, liuyuanliu}@qiyi.com ABSTRACT In this paper, we present a video-based emotion recognition system submitted to the EmotiW 2016 Challenge. The core module of this system is a hybrid network that combines recurrent neural network (RNN) and 3D convolutional networks (C3D) in a late-fusion fashion. RNN and C3D encode appearance and motion information in different ways. Specifically, RNN takes appearance features extracted by convolutional neural network (CNN) over individual video frames as input and encodes motion later, while C3D models appearance and motion of video simultaneously. Combined with an audio module, our system achieved a recognition accuracy of 59.02% without using any additional emotion-labeled video clips in training set, compared to 53.8% of the winner of EmotiW 2015. Extensive experiments show that combining RNN and C3D together can improve video-based emotion recognition noticeably. The EmotiW challenge has been successfully held for four years since 2013 and has made great influences in the emotion recognition area. Previous winners usually focus on facial graph analysis [3, 23] or designing specific CNN-RNN networks [5]. Such classifier that TI - Video-based emotion recognition using CNN-RNN and C3D hybrid networks DA - 2016-10-31 UR - https://www.deepdyve.com/lp/association-for-computing-machinery/video-based-emotion-recognition-using-cnn-rnn-and-c3d-hybrid-networks-Sbk2lNGjoK DP - DeepDyve ER -