“ASR-howto”版本间的差异

2013年5月26日 (日) 10:33的版本

1. how to build kaldi on linux?

Building Kaldi on windows with VS is pretty annoying. We therefore highly recommend to build the stuff within cygwin. The process is simple:

. install cygwin. Select the following components: a. make b. gcc c. automake d. perl e. python f. clapack g. wget h. gfortrain+g77+f77 i. zlib
. download kaldi from CSLT server at /nfs/disk/perm/tool/kaldi
. install tools: go to kaldi/tools, run install.sh if you have all the required components installed.
. install the core: go to kaldi/src, ./configure; make

2. how to create dictionary

Given a list of words, the lexicon can be build as follows:

awk '{print $1}' word.list |sort -u |/nfs/disk/work/asr/toolkit/lex/gen_word_lexicon_from_big_lexicon.py
check the lexicon maunally to remove incorrect pronunciations
check the words that fail to generate pronunciations, create it by yourself.

The above default uses the Tencent 110k lexicon. If you want to produce dictionaries based on other phone system, you need set argument for gen_word_lexicon_from_big_lexicon.py by -w , and if you want provide additional background dictionary, set the -e option.
I set the current word segment system (IKAnalyzer3.2.5Stable) to use the Tencent 110k lexicon for consistency with pronunciation generation. If you use a different background dictionary, then better to replace the lexicon for IKAnalyzer as well. It is simple to put your dic in /nfs/disk/work/asr/toolkit/lex/wordseg/IKAnalyzer3.2.5Stable_src, and then specify it in /nfs/disk/work/asr/toolkit/lex/wordseg/IKAnalyzer3.2.5Stable_src/IKAnalyzer.cfg.xml

@@ 第1行： / 第1行： @@
 ==1. how to build kaldi on linux?==
 Building Kaldi on windows with VS is pretty annoying. We therefore highly recommend to build the stuff within cygwin. The process is simple:
-#. install cygwin. Select the following components:
+#. install cygwin. Select the following components:  a. make b. gcc c. automake d. perl e. python f. clapack g. wget h. gfortrain+g77+f77 i. zlib
-:  a. make b. gcc c. automake d. perl e. python f. clapack g. wget h. gfortrain+g77+f77 i. zlib
 #. download kaldi from CSLT server at /nfs/disk/perm/tool/kaldi
 #. install tools: go to kaldi/tools, run install.sh if you have all the required components installed.
 #. install the core: go to kaldi/src, ./configure; make
-==2. how to create lexicon==
+==2. how to create dictionary==
+Given a list of words, the lexicon can be build as follows:
+#awk '{print $1}' word.list |sort -u  |/nfs/disk/work/asr/toolkit/lex/gen_word_lexicon_from_big_lexicon.py
+#check the lexicon maunally to remove incorrect pronunciations
+#check the words that fail to generate pronunciations, create it by yourself.
+*The above default uses the Tencent 110k lexicon. If you want to produce dictionaries based on other phone system, you need set argument for gen_word_lexicon_from_big_lexicon.py by -w , and if you want provide additional background dictionary, set the -e option.
+* I set the current word segment system (IKAnalyzer3.2.5Stable) to use the Tencent 110k lexicon for consistency with pronunciation generation. If you use a different background dictionary, then better to replace the lexicon for IKAnalyzer as well. It is simple to put your dic in /nfs/disk/work/asr/toolkit/lex/wordseg/IKAnalyzer3.2.5Stable_src, and then specify it in /nfs/disk/work/asr/toolkit/lex/wordseg/IKAnalyzer3.2.5Stable_src/IKAnalyzer.cfg.xml

“ASR-howto”版本间的差异

2013年5月26日 (日) 10:33的版本

1. how to build kaldi on linux?

2. how to create dictionary

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具