<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Deal_with_numbers_in_LM_training</id>
		<title>Deal with numbers in LM training - 版本历史</title>
		<link rel="self" type="application/atom+xml" href="http://cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Deal_with_numbers_in_LM_training"/>
		<link rel="alternate" type="text/html" href="http://cslt.org/mediawiki/index.php?title=Deal_with_numbers_in_LM_training&amp;action=history"/>
		<updated>2026-04-17T01:30:43Z</updated>
		<subtitle>本wiki的该页面的版本历史</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://cslt.org/mediawiki/index.php?title=Deal_with_numbers_in_LM_training&amp;diff=187&amp;oldid=prev</id>
		<title>166.111.134.19：以内容“Numbers are not simple to handle, for all languages. The basic problem is that numbers are open, and therefore the context of numbers are not simple to model.  Our appr...”创建新页面</title>
		<link rel="alternate" type="text/html" href="http://cslt.org/mediawiki/index.php?title=Deal_with_numbers_in_LM_training&amp;diff=187&amp;oldid=prev"/>
				<updated>2012-09-13T01:39:21Z</updated>
		
		<summary type="html">&lt;p&gt;以内容“Numbers are not simple to handle, for all languages. The basic problem is that numbers are open, and therefore the context of numbers are not simple to model.  Our appr...”创建新页面&lt;/p&gt;
&lt;p&gt;&lt;b&gt;新页面&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Numbers are not simple to handle, for all languages. The basic problem is that numbers are open, and therefore the context of numbers are not simple to model.  Our approach is to substitue numbers into a single token &amp;quot;NUM&amp;quot;. By bulding NUM-beared LM and a graph for NUM and composing these two graphs, we hope to train a robust model.&lt;br /&gt;
&lt;br /&gt;
The first step, hence, is resubstitue numbers into NUM.  The following steps are taken:&lt;br /&gt;
&lt;br /&gt;
1. find all words with number 0-9, and replace it to NUM directly&lt;br /&gt;
2. find all words with chinese number '零'-'九', form a number word list L0&lt;br /&gt;
3. since some of the words are actually not numbers, such as '三纲五常', we remove the words in a pre-defined lexicon V from L0, get L=L0-V&lt;br /&gt;
4. the pre-defined lexicon V is from a general lexicon V0, by removing pure numbers, such as '一','二','一九一九'&lt;br /&gt;
5. design the mapping M=L -&amp;gt; num&lt;br /&gt;
6. using M to substitue numbers in the training text to 'NUM'.&lt;/div&gt;</summary>
		<author><name>166.111.134.19</name></author>	</entry>

	</feed>