“Dongxu Zhang 14-11-03”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“=== Accomplished this week === * Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess). * Use 1...”为内容创建页面)
 
(没有差异)

2014年11月2日 (日) 16:52的最后版本

Accomplished this week

  • Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess).
  • Use 166k vocabulary to train lm on baiduhi, baiduzhidao seperately,(still running ,pruning)
  • Extract sentences which contains English and numbers from weibo corpus.
  • Running BPTT using rwthlm. Still not normal. High ppl, low wer. But it seems that using rwthlm itself, lstm is indeed better than standard bptt.
  • Found a tool called Shenlan which can parse Sogou cell vocabulary. Using its code with a crawler, we can update our vocabulary with new words.

Planned for next week

  • Working on building lm and comparing vocabulary.
  • Working on rwthlm.