“Public data”版本间的差异

2017年10月31日 (二) 16:17的版本

CCC data resource

CSLT holds a close collaboration with Chinese Corpus Consortium (CCC) to collect and publish databases in China. The aim of the CCC is to provide corpora for Chinese ASR, TTS, NLP, perception analysis, phonetics analysis, linguistic analysis, and other related tasks. The corpora can be speech- or text-based; read or spontaneous; wideband or narrowband; standard or dialectal Chinese; clean or with noise; or of any other kinds which are deemed helpful for the foresaid purposes.

visit CCC

Uyghur text database

CSLT collaborated with the XinJiang University on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.

download download from Baidu

Sheik Cantonese lexicon

A free Cantonese lexicon collected from Adam Sheik's Cantonese Dict project.

check details

THUYG-20 database

A free speech database for constructing a full-fledged Uyghur ASR system.

check details

THUYG-20 SRE database

A free speech database for constructing a full-fledged Uyghur speaker recognition system.

check details

SUD-12 database

A speech database used for short utterance speaker recognition

check details

THUCH30 database

A speech database used for Chinese LVCSR. Recorded by Dong Wang many many years ago.

check details

kazak ASR database

A speech database used for Kazak LVCSR.

QQ weiyun share link

Tibetan ASR database

A speech database used for Tibetan LVCSR.

QQ weiyun share link

2017年10月31日 (二) 16:17的版本（查看源代码） Shiying（讨论 \| 贡献）（→‎Tibetan ASR database） ←上一编辑		2017年10月31日 (二) 16:17的版本（查看源代码） Shiying（讨论 \| 贡献）（→‎Tibetan ASR database）下一编辑→
第51行：		第51行：
	A speech database used for Tibetan LVCSR.		A speech database used for Tibetan LVCSR.

−	[ https://share.weiyun.com/da691bff0f7c641646ae9fb1154ffdce QQ weiyun share link ]	+	[https://share.weiyun.com/da691bff0f7c641646ae9fb1154ffdce QQ weiyun share link]

“Public data”版本间的差异

2017年10月31日 (二) 16:17的版本

目录

CCC data resource

Uyghur text database

Sheik Cantonese lexicon

THUYG-20 database

THUYG-20 SRE database

SUD-12 database

THUCH30 database

kazak ASR database

Tibetan ASR database

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具