140428-Xiaoxi Wang

来自cslt Wiki
2014年4月28日 (一) 09:56Wxx讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

This week:

preprocessed the baiduzhidao and part of weibo data.

wrote a Hanzi2Num tool

sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them

classified corpora according to keywords.


Next week:

Train and evaluate lm from classified corpora

make improves on algorithms