“TTS-project-synthesis”版本间的差异
来自cslt Wiki
第18行: | 第18行: | ||
*Child[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/huilian/child01.neutral/child01-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | *Child[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/huilian/child01.neutral/child01-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
− | ==Multi-speaker mix- | + | ==Multi-speaker mix-trainingr== |
+ | ===Without Speaker-vector=== | ||
*Female & Male[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/female01-male01/female01-male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | *Female & Male[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/female01-male01/female01-male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
第26行: | 第27行: | ||
− | == | + | ===With speaker-vector=== |
When synthesis, we just replace the speaker-vector for specific person. | When synthesis, we just replace the speaker-vector for specific person. | ||
*Specific person=== | *Specific person=== | ||
第57行: | 第58行: | ||
::*(11) 1.0:0.0[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/iterpolation/female01_male01/iterpolation_10_female01_male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ::*(11) 1.0:0.0[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/iterpolation/female01_male01/iterpolation_10_female01_male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | |||
+ | ==Mono-speaker Emotion TTS== | ||
+ | *Specific emotion | ||
+ | :* Neutral emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | :* Happy emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-happy_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | :* Sorrow emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-sorrow_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | :* Angry emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-angry_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | |||
+ | *Interpolation emotion | ||
+ | :* Angry & neutral with different ratio | ||
+ | ::*(1) 0.0:1.0 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_0_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(2) 0.1:0.9 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_1_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(3) 0.2:0.8 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_2_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(4) 0.3:0.7 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_3_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(5) 0.4:0.6 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_4_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(6) 0.5:0.5 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(7) 0.6:0.4 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_6_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(8) 0.7:0.3 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_7_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(9) 0.8:0.2 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_8_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(10) 0.9:0.1 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_9_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] | ||
+ | ::*(11) 1.0:0.0 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-angry_1_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav] |
2017年12月1日 (五) 03:41的版本
目录
Project name
Text To Speech
Project members
Dong Wang, Zhiyong Zhang
Introduction
Text To Speech
Sample waves
Synthesis text:好雨知时节,当春乃发声,随风潜入夜,润物细无声
Mono-speaker TTS
- Female[1]
- Male[2]
- Child[3]
Multi-speaker mix-trainingr
Without Speaker-vector
- Female & Male[4]
- Female & Child[5]
- Male & Child[6]
With speaker-vector
When synthesis, we just replace the speaker-vector for specific person.
- Specific person===
- Female[7]
- Male[8]
- Interpolate the speaker-vector of different person
- Female & Male with different ratio
- (1) 0.0:1.0[9]
- (2) 0.1:0.9[10]
- (3) 0.2:0.8[11]
- (4) 0.3:0.7[12]
- (5) 0.4:0.6[13]
- (6) 0.5:0.5[14]
- (7) 0.6:0.4[15]
- (8) 0.7:0.3[16]
- (9) 0.8:0.2[17]
- (10) 0.9:0.1[18]
- (11) 1.0:0.0[19]
Mono-speaker Emotion TTS
- Specific emotion
- Interpolation emotion