TY - JOUR AU - AB - The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) * * * Jinglin Liu , Chengxi Li , Yi Ren , Feiyang Chen, Zhou Zhao Zhejiang University {jinglinliu,chengxili,rayeren,zhaozhou}@zju.edu.cn, chenfeiyangai@gmail.com Abstract 2019; Lee et al. 2019; Blaauw and Bonada 2020; Ren et al. 2020; Chen et al. 2020) . Singing voice synthesis (SVS) systems are built to synthe- Previous singing acoustic models mainly utilize simple size high-quality and expressive singing voice, in which the loss (e.g., L1 or L2) to reconstruct the acoustic features. acoustic model generates the acoustic features (e.g., mel- However, this optimization is based on the incorrect uni- spectrogram) given a music score. Previous singing acous- modal distribution assumptions, leading to blurry and over- tic models adopt a simple loss (e.g., L1 and L2) or gener- smoothing outputs. Although existing methods endeavor ative adversarial network (GAN) to reconstruct the acoustic features, while they suffer from over-smoothing and unsta- to solve this problem by generative adversarial network ble training issues respectively, which hinder the naturalness (GAN) (Lee et al. 2019; Chen et al. 2020), training an ef- of synthesized singing. In this work, we propose DiffSinger, fective GAN may occasionally fail due to the unstable dis- an acoustic model TI - DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism JF - Proceedings of the AAAI Conference on Artificial Intelligence DO - 10.1609/aaai.v36i10.21350 DA - 2022-06-28 UR - https://www.deepdyve.com/lp/unpaywall/diffsinger-singing-voice-synthesis-via-shallow-diffusion-mechanism-NIUlXunOPH DP - DeepDyve ER -