“Public data”版本间的差异
第6行: | 第6行: | ||
==Uyghur text database== | ==Uyghur text database== | ||
− | [http://cslt.riit.tsinghua.edu.cn:8081/download/uygh/zip/data.tar. | + | |
+ | CSLT collaborated with the [http://www.xju.edu.cn/ XinJiang University] on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013. | ||
+ | |||
+ | [http://cslt.riit.tsinghua.edu.cn:8081/download/uygh/zip/data.tar.gz download] |
2014年9月30日 (二) 08:40的版本
CCC data resource
CSLT holds a close collaboration with Chinese Corpus Consortium (CCC) to collect and publish databases in China. The aim of the CCC is to provide corpora for Chinese ASR, TTS, NLP, perception analysis, phonetics analysis, linguistic analysis, and other related tasks. The corpora can be speech- or text-based; read or spontaneous; wideband or narrowband; standard or dialectal Chinese; clean or with noise; or of any other kinds which are deemed helpful for the foresaid purposes.
Uyghur text database
CSLT collaborated with the XinJiang University on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.