“Public data”版本间的差异
第4行: | 第4行: | ||
[http://www.cccforum.org visit CCC] | [http://www.cccforum.org visit CCC] | ||
+ | |||
==Trivial events database== | ==Trivial events database== | ||
第10行: | 第11行: | ||
[https://share.weiyun.com/389a55251c59fc4f9740d5c28be380f7 download from Cloud] | [https://share.weiyun.com/389a55251c59fc4f9740d5c28be380f7 download from Cloud] | ||
+ | |||
+ | |||
+ | ==Disguise database== | ||
+ | A free database involving human's normal speech and disguised speech. The data is collected using a recording Android App. | ||
+ | |||
+ | [https://share.weiyun.com/a7355eb4321dafd2887460daa915191d download from Cloud] | ||
+ | |||
==Uyghur text database== | ==Uyghur text database== | ||
第29行: | 第37行: | ||
[http://data.cslt.org/thuyg20/README.html check details] | [http://data.cslt.org/thuyg20/README.html check details] | ||
+ | |||
== THUYG-20 SRE database == | == THUYG-20 SRE database == | ||
第48行: | 第57行: | ||
[http://data.cslt.org/thchs30/README.html check details] | [http://data.cslt.org/thchs30/README.html check details] | ||
+ | |||
==kazak ASR database== | ==kazak ASR database== | ||
第58行: | 第68行: | ||
You can send e-mail to shiying@cslt.riit.tsinghua.edu.cn to ask for share password. | You can send e-mail to shiying@cslt.riit.tsinghua.edu.cn to ask for share password. | ||
+ | |||
==Tibetan ASR database== | ==Tibetan ASR database== |
2017年12月27日 (三) 11:55的版本
目录
CCC data resource
CSLT holds a close collaboration with Chinese Corpus Consortium (CCC) to collect and publish databases in China. The aim of the CCC is to provide corpora for Chinese ASR, TTS, NLP, perception analysis, phonetics analysis, linguistic analysis, and other related tasks. The corpora can be speech- or text-based; read or spontaneous; wideband or narrowband; standard or dialectal Chinese; clean or with noise; or of any other kinds which are deemed helpful for the foresaid purposes.
Trivial events database
A free database involving 7 types of human trivial events: cough, laugh, "wei", "hmm", "tsk-tsk", "ahem", sniff. The data is collected using a recording Android App.
Disguise database
A free database involving human's normal speech and disguised speech. The data is collected using a recording Android App.
Uyghur text database
CSLT collaborated with the XinJiang University on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.
Sheik Cantonese lexicon
A free Cantonese lexicon collected from Adam Sheik's Cantonese Dict project.
THUYG-20 database
A free speech database for constructing a full-fledged Uyghur ASR system.
THUYG-20 SRE database
A free speech database for constructing a full-fledged Uyghur speaker recognition system.
SUD-12 database
A speech database used for short utterance speaker recognition
THUCH30 database
A speech database used for Chinese LVCSR. Recorded by Dong Wang many many years ago.
kazak ASR database
A speech database used for Kazak LVCSR.
The entire package involves the full set of speech and language resources required to establish a Kazak speech recognition system.
You can send e-mail to shiying@cslt.riit.tsinghua.edu.cn to ask for share password.
Tibetan ASR database
A speech database used for Tibetan LVCSR.
The entire package involves the full set of speech and language resources required to establish a Tibetan speech recognition system.
You can send e-mail to shiying@cslt.riit.tsinghua.edu.cn to ask for share password.