name |
type |
size |
dir |
description
|
863 |
speech |
51h, 76spk |
corpora/863 |
863 reading speech database. 16k,16bit
|
emotion |
speech |
22h |
corpora/emotion |
emotional speech for SID, recorded in CSLT. 16k,16bit
|
callhome |
text |
9.02Mb |
corpora/callhome |
callhome chinese speech database transcription
|
tcmsd |
speech |
34h,60spk |
corpora/tcmsd |
speech database recorded in Tsinghua, 2002. 16k, 16bit
|
timit |
speech |
5.4h |
corpora/timit |
English timit database
|
gigaword |
text |
668MW |
corpora/chinese_gigaword |
Gigaword text for Chinese
|
ulgur |
speech&text |
xju: 141h (tr. 136h) xjnu: 8.54h |
corpora/ulgur |
ulgur speech and text data
|
tvboard |
speech |
- |
corpora/tvboard |
tv and broadcast no-transcribed archieve
|
weibo |
text |
10Gb |
corpora/weibo |
English weibo text data
|
qa |
text |
124Gb |
corpora/qa |
QA text data
|
pvad |
speech |
5.4h |
corpora/puqiang/VAD |
speech data for VAD, from Pachira
|
ppoi |
speech |
208h |
corpora/puqiang/poi |
8k telephone speech in poi from Pachira
|
T400 |
speech |
400h |
corpora/tencent |
speech data from Tencent
|
dt700 |
speech |
700h |
corpora/tencent/dt700 |
700 hour reading speech data
|
legend-vod |
speech |
- |
corpora/legend-vod |
some test speech and vod
|
mobil-eng |
speech |
26h |
corpora/lenvxx/data/wav/mobil-eng |
english speech of chinese people
|
legend-online |
speech |
54h |
corpora/lenvxx/data/wav/real-online |
online speech data
|
legend-wakeup |
speech |
1h |
corpora/lenvxx/data/wav/wake-up |
wake up test speech
|
legend-reading |
speech |
21h |
corpora/lenvxx/data/wav/haitian |
reading speech
|
legend-sel-for-test |
speech |
21h |
corpora/lenvxx/data/wav/sel_for_test |
reading speech
|
POI-lexicon |
lexicon |
- |
corpora/lenvxx/data/lexicon |
lexicon for POI applications
|
NLPR |
lexicon,categories |
- |
corpora/lenvxx/data/text/nlpcorpus |
resources of NLP tasks
|
serviceT |
text |
- |
corpora/lenvxx/data/text/service_text |
text recorded from online service
|
sougouText |
text |
- |
corpora/sogou |
sogouQ and sogouT
|
wsj |
speech |
100h |
corpora/wsj |
wall-street journal speech db
|
hownet |
lexicon |
- |
corpora/hownet |
HowNet relation db
|
casia |
speech |
4000 u |
corpora/tts/casia |
male TTS speech
|
huilan-tts |
speech |
2000 u |
corpora/tts/huilan |
male/female TTS speech from Huilan
|
tts-novel |
speech |
20h |
corpora/tts/novel |
speech data download from internet for tts
|
Sinovoice-tel |
speech |
470h+300h |
corpora/sinovoice/tel |
telephone speech data from Sinovoice
|
Sinovoice-16k |
speech |
6000h |
corpora/sinovoice/16k |
mobile 16k speech data from Sinovoice
|