140512-Xi Ma
来自cslt Wiki
Last Week:
1.Extract the corpus of related areas from the original corpus by keyword.
2.Mark the pinyin for the keyword list.
This Week:
1. Testing ppl of each sentence from the original corpus and extracting sentences of less than a specific ppl form a new training set.
2. Train language model by using new training set and test the ppl.