“AP17:OLR-special session”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的13个中间修订版本未显示)
第1行: 第1行:
 
==Title==
 
==Title==
  
Minor- and Multilingual speech and language processing
+
Multilingual speech and language processing for minority languages
  
 
==Organizers==
 
==Organizers==
  
Dong Wang: Tsinghua University (wangdong99@mails.tsinghua.edu.cn)
+
'''Dong Wang: Tsinghua University (wangdong99@mails.tsinghua.edu.cn)'''
Guanyu Li: Northwest National University (guanyu-li@163.com)
+
Mijit Ablimit: Xinjiang University (mijit@xju.edu.cn)
+
  
 +
Dr. Dong Wang got his PhD degree at the University of Edinburgh, and worked in Oracle, IBM, and Nuance. He is now an assistant professor at the certer for speech and language technologies (CSLT) at Tsinghua University. Dr. Wang’s research interest covers speech processing, language processing and financial processing. He has published more than 80 academic papers in the related area, including three best paper awards. Dr. Wang plays active roles in the speech research community: he serves as the secretary in national conference of machine-man speech communication (NCMMSC) and a country representative of the mainland China in Oriental COCOSDA. He was the local chair of ChinaSIP 2013, special session co-chair of ISCSLP 14 and plenary talk co-chair of ISCSLP 16. Dr. Wang is now serving as the vice Chair of the SLA track of APSIPA.
 +
 +
 +
'''Guanyu Li: Northwest National University (guanyu-li@163.com)'''
 +
 +
Dr. Guanyu Li got his PhD degree at the Northwest University for Nationalities, Gansu Province, China. He worked in several ERP software development companies as a developmental engineer, and is now an associate professor at the Northwest University for Nationalities and the Key Laboratory of National Language Intelligent Processing,Gansu Province. His research interest includes speech processing for minor languages in China, especially speech recognition and speech synthesis. In recent years, he published more than ten papers in related areas.
 +
 +
 +
'''Mijit Ablimit: Xinjiang University (mijit@xju.edu.cn)
 +
Dr. Mijit Ablimit got his PhD degree at Kyoto University of Japan. He is now an associate professor at the Information Technology and Engineering college of Xinjiang University. His research interest covers speech, language, and multilinuage information processing for less popular languages of China.
 +
 +
==Target track==
 +
 +
Speech and Language processing
  
 
==Introduction==
 
==Introduction==
  
  
Minor- and multilingual phenomenon is a important for modern international societies.  
+
Minor- and multi-lingual phenomenon is a important for modern international societies.  
This special session focuses on minor- and multilingual speech and language processing,  
+
This special session focuses on minor- and multi-lingual speech and language processing,  
 
including but not limited to the following topics:
 
including but not limited to the following topics:
  
- Minor- and Multilingual phonetic and phonological analysis
+
* Minor- and Multi-lingual phonetic and phonological analysis
- Minor- and Multilingual speech recognition
+
* Minor- and Multi-lingual speech recognition
- Minor- and Multilingual speaker recognition
+
* Minor- and Multi-lingual speaker recognition
- Minor- and Multilingual speech synthesis
+
* Minor- and Multi-lingual speech synthesis
- Minor- and Multilingual language understanding
+
* Minor- and Multi-lingual language understanding
- Resource construction for minor- and multilingual langauges
+
* Resource construction for minority languages
 
+
  
 
==Potential Papers==
 
==Potential Papers==
  
  
===Title: Prior-constrained multilingual speech recognition ===
+
===Title: AP17-OLR Challenge: Data, Plan, and Baseline ===
*Author: Ying Shi, Zhiyuan Tang, Dong Wang
+
*Author: Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
  
*Abstract: Conventional multilingual speech recognition follows ether a tandem approach (language identification)  
+
*Abstract: We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.
or parallel architecture (parallel decoding). This paper presented a novel prior-constrained approach that
+
Compare to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on
conduct the decoding in a multilingual linguistic space, where a prior of the language is used to constrain
+
short utterances. The data are
the decoding frame by frame. Our experiments found that this approach can realize true simultaneous multilingual  
+
offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines were constructed to assist the participants,
speech recognition.  
+
one is based on the i-vector model and the other is based on various neural networks.
 +
We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan
 +
and demonstrate that the combined database is a reasonable data resource for multilingual research.
 +
All the data are free for participants, and the Kaldi recipes for the baselines have been published online.
  
  
===Title: Memory-based Uyghur-Chinese Translation===
+
===Title: Memory-augmented Chinese-Uyghur Neural Machine Translation===
*Author: Shiyue Zhang, Guli, Mijit Ablimit, Askar Hamdulla
+
*Author: Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla
  
*Abstract: Neural machine translation (NMT) has achieved significant performance. However, this NMT approach  
+
*Abstract: Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT).  Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.  
has not yet effectively applied to minor languages such as Uyghur to Chinese translation. The main problem here
+
is that the limited training data does not support an end-to-end neural learning. In this paper, we propose to
+
use a memory structure to assist the NMT inference under the condition of limited resource languages. Our experiments
+
demonstrated that the this approach is highly efficient compared to the vanilla NMT, and outperforms the conventional
+
statistical machine translation (SMT) approach.  
+
  
===Title: Resource construction for Mongolia ===
+
 
*Author: Shipeng Xu, Guanyu Li, Hongzhi Yu
+
===Title: Language Resource Construction for Mongolian===
 +
*Author: Shipeng Xu , Hongzhi Yu, Thomas Fang Zheng and Jinghao Yan
  
 
*Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic  
 
*Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic  
第55行: 第65行:
 
resource construction supported by the NSFC project.  
 
resource construction supported by the NSFC project.  
  
===Title: Resource construction for tibetan===
+
===Title: Free Linguistic and Speech Resources for Tibetan===
*Author: Guanyu Li, Hongzhi Yu
+
*Author: Guanyu Li, Hongzhi Yu,Thomas Fang Zheng,  Jinghao Yan
 +
 
 +
*Abstract: Tibetan is an important low-resource language in China.  A key factor that hinders the speech and language research for Tibetan is the lack of resources, particularly free ones. This paper describes our recent progression on Tibetan resource construction supported by the NSFC M2ASR project, including the phone set, lexicon, as well as the transcription of a large scale speech corpus. Following the M2ASR free data program, all the resources are publicly available and free for researchers. We also release a small Tibetan speech database that can be used to build a proto type Tibetan speech recognition system.
 +
 
 +
 
 +
===Title: A Free Kazak Speech Database and a Speech Recognition Baseline===
 +
*Author: Ying Shi, Askar Hamdulla, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
 +
 
 +
*Abstract: Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese,
 +
partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages,
 +
however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech
 +
databases for minority languages, and the only few databases are held by some institutes as private properties, far from
 +
open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including
 +
phone set, lexicon, and language model.
  
*Abstract: Tibetan is an important low-resource language in China. The syllable structure of Tibetan is similar
+
In this paper, we publish a speech database in Kazak, a major minority language in the western China. Accompanying this
as Chinese, but the composition rules in orthographic forms is highly complex. Additionally, the lexicon
+
database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed.
resource is far from standard and rich. This paper describes our recent progression on Tibetan
+
We will describe the recipe for constructing a baseline system, and report our present results.
resource construction supported by the NSFC M2ASR project.  
+
The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project
 +
supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.
  
===Title: A large Kazak speech database and a speech recognition baseline===
 
*Author: Askar Hamdulla, Ying Shi
 
  
*Abstract: We describe the construction process of a large scale Kazak speech database. The database involves
+
===Title: A Multilingual Language Processing Tool for Uyghur, Kazak and Kirghiz ===
150 hours of speech signals, recorded by more than 200 speakers. A speech recognition baseline system based
+
*Author: Mijit Ablimit, Sardar Parhat, Askar Hamdulla, Thomas Fang Zheng
on the Kaldi toolkit was also constructed. We hope this database will be a standard dataset for a multiple  
+
*Abstract: Natural language processing for less popular languages is difficult, partly due to the high variations in the writing form. On the other hand, many minority languages in the same region share similar properties and can be processed in a similar way. This paper publishes an integrated multilingual language processing tool. Our aim is to provide an open, free and standard toolkit for minority language processing tasks, by a uniform user interface to support multiple languages. The present implementation supports Uyghur, Kazak, Kirghiz, three major minority languages in the Western China, and our focus was put on phonetic and morphological analysis. For the phonetic analysis, we build a multilingual parallel phoneme list, with similar phonemes grouped and character codes standardized. A multilingual syllable analyzer is also developed to detect spelling mistakes, and extract irregular spelling. For the morphological analysis, we build a multilingual morpheme segmentation tool that can extract morphemes by statistical analysis. This toolkit is extendable in terms of both functions and languages.
Kazak speech processing tasks, including ASR, speaker recognition and language understanding.
+

2017年6月20日 (二) 12:23的最后版本

Title

Multilingual speech and language processing for minority languages

Organizers

Dong Wang: Tsinghua University (wangdong99@mails.tsinghua.edu.cn)

Dr. Dong Wang got his PhD degree at the University of Edinburgh, and worked in Oracle, IBM, and Nuance. He is now an assistant professor at the certer for speech and language technologies (CSLT) at Tsinghua University. Dr. Wang’s research interest covers speech processing, language processing and financial processing. He has published more than 80 academic papers in the related area, including three best paper awards. Dr. Wang plays active roles in the speech research community: he serves as the secretary in national conference of machine-man speech communication (NCMMSC) and a country representative of the mainland China in Oriental COCOSDA. He was the local chair of ChinaSIP 2013, special session co-chair of ISCSLP 14 and plenary talk co-chair of ISCSLP 16. Dr. Wang is now serving as the vice Chair of the SLA track of APSIPA.


Guanyu Li: Northwest National University (guanyu-li@163.com)

Dr. Guanyu Li got his PhD degree at the Northwest University for Nationalities, Gansu Province, China. He worked in several ERP software development companies as a developmental engineer, and is now an associate professor at the Northwest University for Nationalities and the Key Laboratory of National Language Intelligent Processing,Gansu Province. His research interest includes speech processing for minor languages in China, especially speech recognition and speech synthesis. In recent years, he published more than ten papers in related areas.


Mijit Ablimit: Xinjiang University (mijit@xju.edu.cn) Dr. Mijit Ablimit got his PhD degree at Kyoto University of Japan. He is now an associate professor at the Information Technology and Engineering college of Xinjiang University. His research interest covers speech, language, and multilinuage information processing for less popular languages of China.

Target track

Speech and Language processing

Introduction

Minor- and multi-lingual phenomenon is a important for modern international societies. This special session focuses on minor- and multi-lingual speech and language processing, including but not limited to the following topics:

  • Minor- and Multi-lingual phonetic and phonological analysis
  • Minor- and Multi-lingual speech recognition
  • Minor- and Multi-lingual speaker recognition
  • Minor- and Multi-lingual speech synthesis
  • Minor- and Multi-lingual language understanding
  • Resource construction for minority languages

Potential Papers

Title: AP17-OLR Challenge: Data, Plan, and Baseline

  • Author: Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
  • Abstract: We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.

Compare to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data are offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines were constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data are free for participants, and the Kaldi recipes for the baselines have been published online.


Title: Memory-augmented Chinese-Uyghur Neural Machine Translation

  • Author: Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla
  • Abstract: Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT). Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.


Title: Language Resource Construction for Mongolian

  • Author: Shipeng Xu , Hongzhi Yu, Thomas Fang Zheng and Jinghao Yan
  • Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic

analysis, phonetic rules, lexicon, speech and text data. This paper describes our recent progression on Mongolia resource construction supported by the NSFC project.

Title: Free Linguistic and Speech Resources for Tibetan

  • Author: Guanyu Li, Hongzhi Yu,Thomas Fang Zheng, Jinghao Yan
  • Abstract: Tibetan is an important low-resource language in China. A key factor that hinders the speech and language research for Tibetan is the lack of resources, particularly free ones. This paper describes our recent progression on Tibetan resource construction supported by the NSFC M2ASR project, including the phone set, lexicon, as well as the transcription of a large scale speech corpus. Following the M2ASR free data program, all the resources are publicly available and free for researchers. We also release a small Tibetan speech database that can be used to build a proto type Tibetan speech recognition system.


Title: A Free Kazak Speech Database and a Speech Recognition Baseline

  • Author: Ying Shi, Askar Hamdulla, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
  • Abstract: Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese,

partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages, however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech databases for minority languages, and the only few databases are held by some institutes as private properties, far from open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including phone set, lexicon, and language model.

In this paper, we publish a speech database in Kazak, a major minority language in the western China. Accompanying this database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed. We will describe the recipe for constructing a baseline system, and report our present results. The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.


Title: A Multilingual Language Processing Tool for Uyghur, Kazak and Kirghiz

  • Author: Mijit Ablimit, Sardar Parhat, Askar Hamdulla, Thomas Fang Zheng
  • Abstract: Natural language processing for less popular languages is difficult, partly due to the high variations in the writing form. On the other hand, many minority languages in the same region share similar properties and can be processed in a similar way. This paper publishes an integrated multilingual language processing tool. Our aim is to provide an open, free and standard toolkit for minority language processing tasks, by a uniform user interface to support multiple languages. The present implementation supports Uyghur, Kazak, Kirghiz, three major minority languages in the Western China, and our focus was put on phonetic and morphological analysis. For the phonetic analysis, we build a multilingual parallel phoneme list, with similar phonemes grouped and character codes standardized. A multilingual syllable analyzer is also developed to detect spelling mistakes, and extract irregular spelling. For the morphological analysis, we build a multilingual morpheme segmentation tool that can extract morphemes by statistical analysis. This toolkit is extendable in terms of both functions and languages.