“DataBase”版本间的差异
来自cslt Wiki
(→lenvxx) |
(→nolexicion wordlist) |
||
(相同用户的一个中间修订版本未显示) | |||
第32行: | 第32行: | ||
! name !! size !! dir !! description | ! name !! size !! dir !! description | ||
|- | |- | ||
− | |singer.lexicion|| | + | |singer.lexicion||2060 ||/work/lxs/nlphome/dict/lex-wordlist/music/lr || |
|- | |- | ||
− | |singer.low.lexicion|| | + | |singer.low.lexicion||2060||/work/lxs/nlphome/dict/lex-wordlist/music/lr|| |
|- | |- | ||
− | |singer.pinyin|| | + | |singer.pinyin||2104||/work/lxs/nlphome/dict/lex-wordlist/music/lr|| |
|- | |- | ||
− | |song.lexicion|| | + | |song.lexicion||4639||/work/lxs/nlphome/dict/lex-wordlist/music/lr|| |
|- | |- | ||
− | |song.low.lexicion|| | + | |song.low.lexicion||4639||/work/lxs/nlphome/dict/lex-wordlist/music/lr|| |
|- | |- | ||
− | |song.pinyin|| | + | |song.pinyin||4644||/work/lxs/nlphome/dict/lex-wordlist/music/lr|| |
|- | |- | ||
− | |qa15w-ch-sinovoice.lexicion|| | + | |qa15w-ch-sinovoice.lexicion||92469||/work/lxs/nlphome/dict/lex-wordlist/qa-check|| |
|- | |- | ||
− | |qa15w-ch.pinyin|| | + | |qa15w-ch.pinyin||92469||/work/lxs/nlphome/dict/lex-wordlist/qa-check|| |
|- | |- | ||
− | |qa15w.lexicion|| | + | |qa15w.lexicion||158404||/work/lxs/nlphome/dict/lex-wordlist/qa-check|| |
|- | |- | ||
− | |11w.lexicion|| | + | |11w.lexicion||122172||/work/lxs/nlphome/dict/lex-wordlist/tencent|| |
|- | |- | ||
− | |8w8.lexicion|| | + | |8w8.lexicion||90795||/work/lxs/nlphome/dict/lex-wordlist/tencent|| |
|} | |} | ||
第59行: | 第59行: | ||
! name !! size !! dir !! description | ! name !! size !! dir !! description | ||
|- | |- | ||
− | |singer.wordlist|| | + | |singer.wordlist||2060||/work/lxs/nlphome/dict/nolex-wordlist/music/lr|| |
|- | |- | ||
− | |song.wordlist|| | + | |song.wordlist||4639||/work/lxs/nlphome/dict/nolex-wordlist/music/lr|| |
|- | |- | ||
− | |album.txt|| | + | |album.txt||11736||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |area.txt|| | + | |area.txt||4||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |chart.txt|| | + | |chart.txt||28||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |drama.txt|| | + | |drama.txt||517||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |language.txt|| | + | |language.txt||35||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |singer.txt|| | + | |singer.txt||4456||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |stopwords.txt|| | + | |stopwords.txt||894||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |song.txt|| | + | |song.txt||26153||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |style.txt|| | + | |style.txt||562||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |type.txt|| | + | |type.txt||3||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc|| |
|- | |- | ||
− | |entity.txt|| | + | |entity.txt||36198||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||merge album area chart drama language singer song stopwords style type |
|- | |- | ||
− | |qa15w.wordlist|| | + | |qa15w.wordlist||147996||/work/lxs/nlphome/dict/nolex-wordlist/qa-check|| |
|- | |- | ||
− | |11w.wordlist|| | + | |11w.wordlist||111895||/work/lxs/nlphome/dict/nolex-wordlist/tencent|| |
|- | |- | ||
− | |8w8.wordlist|| | + | |8w8.wordlist||88055||/work/lxs/nlphome/dict/nolex-wordlist/tencent|| |
|- | |- | ||
− | |scws20w-utf8.wordlist|| | + | |scws20w-utf8.wordlist||284646||/work/lxs/nlphome/dict/nolex-wordlist|| |
|} | |} | ||
2014年2月26日 (三) 06:24的最后版本
lm
name | size | dir | description |
---|---|---|---|
SogouQ.full.train.3gram.gz | 132M | /work/lxs/nlphome/lm/SogouQ-500M | trainData=SougouQ(800M);dict=11w-tecent |
SogouT-11w-merge2-1.3gram.gz | 4.1G | /work/lxs/nlphome/lm/SogouT-140G | trainData=SougouT(140G);dict=11w-tencent |
SogouT-11w-merge2-2.3gram.gz | 3.9G | /work/lxs/nlphome/lm/SogouT-140G | |
8w8.3gram.tencent.gz | 452M | /work/lxs/nlphome/lm/Tencent | |
musicQuery-ltc.3gram.gz | 28M | /work/lxs/nlphome/lm/TencentQ/musicQuery | use qa15w-singer-songs.wordlist |
TencentQ.3gram.gz | 1.4G | /work/lxs/nlphome/lm/TencentQ/qa15w | use qa15w.lexicion |
mix-corp1-corp2.3gram.gz | 1.3G | /work/lxs/nlphome/lm/TencentQ/qa15w-nosinger-song | use qa15w-nosinger-song.wordlist |
mix-corp1_0.5-corp2_0.5.3gram.gz | 1.4G | /work/lxs/nlphome/lm/TencentQ/qa15w-singer-song | use qa15w-singer-song.wordlist |
11w_merge6_kn.3gram.gz | 4.3G | /work/lxs/nlphome/lm/TencentQA-100G | trainData=qa(100G),dict=11w-tencent |
8w8_new_merge6_kn.3gram0.gz | 4.5G | /work/lxs/nlphome/lm/TencentQA-100G | trainData=qa(100G),dict=8w8-tencent |
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-5.3gram.gz | 1.4M | /work/lxs/nlphome/lm/jietong | |
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-9.5gram.gz | 389M | /work/lxs/nlphome/lm/jietong |
lexicion wordlist
name | size | dir | description |
---|---|---|---|
singer.lexicion | 2060 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
singer.low.lexicion | 2060 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
singer.pinyin | 2104 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
song.lexicion | 4639 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
song.low.lexicion | 4639 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
song.pinyin | 4644 | /work/lxs/nlphome/dict/lex-wordlist/music/lr | |
qa15w-ch-sinovoice.lexicion | 92469 | /work/lxs/nlphome/dict/lex-wordlist/qa-check | |
qa15w-ch.pinyin | 92469 | /work/lxs/nlphome/dict/lex-wordlist/qa-check | |
qa15w.lexicion | 158404 | /work/lxs/nlphome/dict/lex-wordlist/qa-check | |
11w.lexicion | 122172 | /work/lxs/nlphome/dict/lex-wordlist/tencent | |
8w8.lexicion | 90795 | /work/lxs/nlphome/dict/lex-wordlist/tencent |
nolexicion wordlist
name | size | dir | description |
---|---|---|---|
singer.wordlist | 2060 | /work/lxs/nlphome/dict/nolex-wordlist/music/lr | |
song.wordlist | 4639 | /work/lxs/nlphome/dict/nolex-wordlist/music/lr | |
album.txt | 11736 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
area.txt | 4 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
chart.txt | 28 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
drama.txt | 517 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
language.txt | 35 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
singer.txt | 4456 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
stopwords.txt | 894 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
song.txt | 26153 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
style.txt | 562 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
type.txt | 3 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | |
entity.txt | 36198 | /work/lxs/nlphome/dict/nolex-wordlist/music/ltc | merge album area chart drama language singer song stopwords style type |
qa15w.wordlist | 147996 | /work/lxs/nlphome/dict/nolex-wordlist/qa-check | |
11w.wordlist | 111895 | /work/lxs/nlphome/dict/nolex-wordlist/tencent | |
8w8.wordlist | 88055 | /work/lxs/nlphome/dict/nolex-wordlist/tencent | |
scws20w-utf8.wordlist | 284646 | /work/lxs/nlphome/dict/nolex-wordlist |
lenvxx
path:/nfs/corpus/data/corpora/lenvxx
description:I settle the data in /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus
(in this directory,it include 4 subdirectory:ChinaDivision , dict , dict4VOD , document Resource)
- 1.Directory
- /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict
- 1.include directory
- sogou-dict
- 城市信息:include many provinces' data about the cities' names and places' names in the province,and some localisms,and some cities' information about bus station and the streets' name
- 电子游戏
- 单机游戏:include the console games' name from 2001 to 2011,and some game's wordlist.
- 网游:include the online games' name from 2008 to 2011 and some game's wordlist.
- 工程与应用科学:include the specialized vocabulary wordlists in project field.
- 计算机:include the specialized vocabulary wordlists in computer field,and Alibaba's product vocabulary in many fields.
- 农林鱼畜:include the wordlist about livestock and agriculture.
- 人文科学
- 文学:include the wordlist about ancient Chinese literature and masterwork,and some novels' wordlist.
- 语言:include the wordlists about idiom and Folklore,Network buzzwords.
- 哲学:include the wordlists about philosophy.for instance,Hegel,Marxism.
- 宗教:include the wordlists about Taoism,Buddhism,Islam
- 历史:include the wordlists about the history about Chinese,and Japanese's warring states period,diplomacy.
- 其他:include the wordlist about the ancient Chinese numerology.
- 社会科学
- 法律:include the wordlists about law.
- 教育:include the wordlists about some universities' architecture,and some wordlist about textbook,list of Chinese univercity and America famous univercity.
- 金融:include the wordlists about wordlist about financial.
- 军事:include the wordlists about military.
- 政治:include the wordlists about Party and government offices,political,and ancient China Official institutions
- 其他:include the wordlists about public relations,ethics,anthropology
- 生活:include the wordlists about many fields in our lief.
- 医学:include the wordlists about medical science.
- 艺术
- 书法篆刻:include the wordlists about sculpture and calligraphy.
- 舞蹈:include the wordlists about dance and Gymnastics Rhythmic.
- 戏剧:include the wordlists about drama.
- 音乐:include the wordlists about music major in Chinese and the west.
- 其他:include the wordlists of tea,sculpture,er ren zhuan,world heritage,artist.
- 娱乐
- 电影电视:include the wordlists about science fiction film.
- 动漫:include the wordlists about some cartoons.
- 流行音乐:include the wordlists about a novel of A Song of Ice and Fire,fashionable word or phrase.
- 明星:include the wordlists about some famous person.
- 汽车:include the wordlists about car field.
- 收藏:include the wordlists about advertisement.
- 时尚品牌:the directory is empty.
- 运动休闲
- F1赛车:the directory is empty.
- 奥运:include the wordlists of Olympic.
- 垂钓:include the wordlists of fishing.
- 轮滑:include a wordlist of roller skating.
- 棋牌:include the wordlists about mahjong,go,chinese chess,san guo sha.
- 气功:include the wordlists about qigong.
- 球类:include the wordlists about football,basketball,ping-bang ball,golf,badminton.
- 杀人游戏:the directory is empty.
- 跆拳道:include the wordlists of taekwondo.
- 太极拳:include the wordlists of ba gua,tai ji quan.
- 武术:include the wordlists of wu shu.
- 自行车:the directory is empty.
- 其他:include the wordlists about fencing,judo,wrestling,yoga.
- 自然科学
- 化学:include the wordlists of chemistry.
- 生物:include the wordlists of biology.
- 数学:include the wordlists of math.
- 天文学:include the wordlists of astronomy.
- 物理:include the wordlists of physics.
- 其他:include the wordlists of stone.
- 2.include directory
- movie(include many wordlists about movie major)
- 电影:include the movie wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
- 明星:include the movie star wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
- 3.include directory
- movie-dict(include the wordlists of actor,director,moviename,roles,style)
- 4.include directory
- name(include the wordlists of famous person in inland,Hongkong and Taiwan,Europe and America,Asian.)
- 5.include directory
- NER(include the wordlists of person name in English,Japan,Korea,Russia)
- 6.include directory
- Pinyin(include a wordlists of duo ying zhi)
- 7.include directory
- VOD
- 电视剧:include a wordlist of teleplay.
- 电影:include a wordlist of movie.
- 微电影:include a wordlist of micro film.
- 音乐:include the wordlists of famous songs in inland,Hongkong and Taiwan,Europe and America,Japan and South Korea
- 综艺:include a wordlists of show.
- 8.include directory
- 领域术语(include the wordlists about computer,economy,travel,sports,medicine)
- 9.include directory
- 语言学词库
- 基础名词:it include person,abstract noun,nature,person making things,fashion noun.
- 语言学词汇类别:it include all grammar vocabulary.
- 2.Directory
- /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict4VOD
- the directory include the wordlists of movie distribution company,film award,filmfest,actors'name,chinese and english comparison table.
- 3.Directory
- /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/ChinaDivision
- the directory include 4 wordlists,which divide in 4 level(province name,city name,region name,street name)