“DataBase”版本间的差异
来自cslt Wiki
(以内容“{| class="wikitable" ! name !! type!! size !! dir !! description |- |863 ||speech||51h, 76spk || corpora/863 || 863 reading speech database. 16k,16bit |- |emotion||s...”创建新页面) |
|||
第64行: | 第64行: | ||
|Sinovoice-16k || speech || 6000h || corpora/sinovoice/16k || mobile 16k speech data from Sinovoice | |Sinovoice-16k || speech || 6000h || corpora/sinovoice/16k || mobile 16k speech data from Sinovoice | ||
|- | |- | ||
+ | |} | ||
+ | {| class="wikitable" | ||
+ | ! name !! type!! size !! dir !! description | ||
|} | |} |
2014年2月19日 (三) 05:45的版本
name | type | size | dir | description |
---|---|---|---|---|
863 | speech | 51h, 76spk | corpora/863 | 863 reading speech database. 16k,16bit |
emotion | speech | 22h | corpora/emotion | emotional speech for SID, recorded in CSLT. 16k,16bit |
callhome | text | 9.02Mb | corpora/callhome | callhome chinese speech database transcription |
tcmsd | speech | 34h,60spk | corpora/tcmsd | speech database recorded in Tsinghua, 2002. 16k, 16bit |
timit | speech | 5.4h | corpora/timit | English timit database |
gigaword | text | 668MW | corpora/chinese_gigaword | Gigaword text for Chinese |
ulgur | speech&text | xju: 141h (tr. 136h) xjnu: 8.54h | corpora/ulgur | ulgur speech and text data |
tvboard | speech | - | corpora/tvboard | tv and broadcast no-transcribed archieve |
text | 10Gb | corpora/weibo | English weibo text data | |
qa | text | 124Gb | corpora/qa | QA text data |
pvad | speech | 5.4h | corpora/puqiang/VAD | speech data for VAD, from Pachira |
ppoi | speech | 208h | corpora/puqiang/poi | 8k telephone speech in poi from Pachira |
T400 | speech | 400h | corpora/tencent | speech data from Tencent |
dt700 | speech | 700h | corpora/tencent/dt700 | 700 hour reading speech data |
legend-vod | speech | - | corpora/legend-vod | some test speech and vod |
mobil-eng | speech | 26h | corpora/lenvxx/data/wav/mobil-eng | english speech of chinese people |
legend-online | speech | 54h | corpora/lenvxx/data/wav/real-online | online speech data |
legend-wakeup | speech | 1h | corpora/lenvxx/data/wav/wake-up | wake up test speech |
legend-reading | speech | 21h | corpora/lenvxx/data/wav/haitian | reading speech |
legend-sel-for-test | speech | 21h | corpora/lenvxx/data/wav/sel_for_test | reading speech |
POI-lexicon | lexicon | - | corpora/lenvxx/data/lexicon | lexicon for POI applications |
NLPR | lexicon,categories | - | corpora/lenvxx/data/text/nlpcorpus | resources of NLP tasks |
serviceT | text | - | corpora/lenvxx/data/text/service_text | text recorded from online service |
sougouText | text | - | corpora/sogou | sogouQ and sogouT |
wsj | speech | 100h | corpora/wsj | wall-street journal speech db |
hownet | lexicon | - | corpora/hownet | HowNet relation db |
casia | speech | 4000 u | corpora/tts/casia | male TTS speech |
huilan-tts | speech | 2000 u | corpora/tts/huilan | male/female TTS speech from Huilan |
tts-novel | speech | 20h | corpora/tts/novel | speech data download from internet for tts |
Sinovoice-tel | speech | 470h+300h | corpora/sinovoice/tel | telephone speech data from Sinovoice |
Sinovoice-16k | speech | 6000h | corpora/sinovoice/16k | mobile 16k speech data from Sinovoice |
name | type | size | dir | description |
---|