“Deep Generative Factorization For Speech Signal(ICASSP21)”版本间的差异
来自cslt Wiki
第39行: | 第39行: | ||
----------------------------------------------------------------- | ----------------------------------------------------------------- | ||
− | + | <b>Phone Manipulation</b> | |
− | Model | | + | Model |<i>p(q<sub>2</sub>|x)</i>| bap(dim=5) | mgc(dim=60) |
− | VAE | | + | VAE | 100000 | 130000 | 160000 |
NF | 130000 | 500000 | 6200000 | NF | 130000 | 500000 | 6200000 | ||
DNF | 60000 | 300000 | 3580000 | DNF | 60000 | 300000 | 3580000 |
2020年10月23日 (五) 07:31的版本
目录
Introduction
- This paper presented a speech information factorization method based on a novel deep generative model that we called factorial discriminative normalization flow.
Qualitative and quantitative experimental results show that compared to all other models, the proposed factorial DNF can retain the class structure corresponding to multiple information factors, and changing one factor will cause little distortion on other factors. This demonstrates that factorial DNF can well factorize speech signal into different information factors.
Members
- Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang
Publications
- Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang, "Deep Generative Factorization For Speech Signal", 2020. pdf
Source Code
- xxx
Factorial DNF
- xxx
Experiments
Data
- xx
Encoding
- xx
Factor manipulation
Phone Manipulation Model |p(q2|x)| bap(dim=5) | mgc(dim=60) VAE | 100000 | 130000 | 160000 NF | 130000 | 500000 | 6200000 DNF | 60000 | 300000 | 3580000 f-DNF | 1:1.3:0.6 | 1:4:2+ | 1:40:20+
Future Work
- Test factorial DNF on larger datasets.
- Establish general theories for deep generative factorization.