“AP17:OLR-special session”版本间的差异

2017年6月20日 (二) 12:23的最后版本

Title

Multilingual speech and language processing for minority languages

Organizers

Dong Wang: Tsinghua University (wangdong99@mails.tsinghua.edu.cn)

Dr. Dong Wang got his PhD degree at the University of Edinburgh, and worked in Oracle, IBM, and Nuance. He is now an assistant professor at the certer for speech and language technologies (CSLT) at Tsinghua University. Dr. Wang’s research interest covers speech processing, language processing and financial processing. He has published more than 80 academic papers in the related area, including three best paper awards. Dr. Wang plays active roles in the speech research community: he serves as the secretary in national conference of machine-man speech communication (NCMMSC) and a country representative of the mainland China in Oriental COCOSDA. He was the local chair of ChinaSIP 2013, special session co-chair of ISCSLP 14 and plenary talk co-chair of ISCSLP 16. Dr. Wang is now serving as the vice Chair of the SLA track of APSIPA.

Guanyu Li: Northwest National University (guanyu-li@163.com)

Dr. Guanyu Li got his PhD degree at the Northwest University for Nationalities, Gansu Province, China. He worked in several ERP software development companies as a developmental engineer, and is now an associate professor at the Northwest University for Nationalities and the Key Laboratory of National Language Intelligent Processing，Gansu　Province. His research interest includes speech processing for minor languages in China, especially speech recognition and speech synthesis. In recent years, he published more than ten papers in related areas.

Mijit Ablimit: Xinjiang University (mijit@xju.edu.cn) Dr. Mijit Ablimit got his PhD degree at Kyoto University of Japan. He is now an associate professor at the Information Technology and Engineering college of Xinjiang University. His research interest covers speech, language, and multilinuage information processing for less popular languages of China.

Target track

Speech and Language processing

Introduction

Minor- and multi-lingual phenomenon is a important for modern international societies. This special session focuses on minor- and multi-lingual speech and language processing, including but not limited to the following topics:

Minor- and Multi-lingual phonetic and phonological analysis
Minor- and Multi-lingual speech recognition
Minor- and Multi-lingual speaker recognition
Minor- and Multi-lingual speech synthesis
Minor- and Multi-lingual language understanding
Resource construction for minority languages

Potential Papers

Title: AP17-OLR Challenge: Data, Plan, and Baseline

Author: Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen

Abstract: We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.

Compare to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data are offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines were constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data are free for participants, and the Kaldi recipes for the baselines have been published online.

Title: Memory-augmented Chinese-Uyghur Neural Machine Translation

Author: Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla

Abstract: Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT). Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.

Title: Language Resource Construction for Mongolian

Author: Shipeng Xu , Hongzhi Yu, Thomas Fang Zheng and Jinghao Yan

Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic

analysis, phonetic rules, lexicon, speech and text data. This paper describes our recent progression on Mongolia resource construction supported by the NSFC project.

Title: Free Linguistic and Speech Resources for Tibetan

Author: Guanyu Li, Hongzhi Yu,Thomas Fang Zheng, Jinghao Yan

Abstract: Tibetan is an important low-resource language in China. A key factor that hinders the speech and language research for Tibetan is the lack of resources, particularly free ones. This paper describes our recent progression on Tibetan resource construction supported by the NSFC M2ASR project, including the phone set, lexicon, as well as the transcription of a large scale speech corpus. Following the M2ASR free data program, all the resources are publicly available and free for researchers. We also release a small Tibetan speech database that can be used to build a proto type Tibetan speech recognition system.

Title: A Free Kazak Speech Database and a Speech Recognition Baseline

Author: Ying Shi, Askar Hamdulla, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

Abstract: Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese,

partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages, however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech databases for minority languages, and the only few databases are held by some institutes as private properties, far from open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including phone set, lexicon, and language model.

In this paper, we publish a speech database in Kazak, a major minority language in the western China. Accompanying this database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed. We will describe the recipe for constructing a baseline system, and report our present results. The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.

Title: A Multilingual Language Processing Tool for Uyghur, Kazak and Kirghiz

Author: Mijit Ablimit, Sardar Parhat, Askar Hamdulla, Thomas Fang Zheng
Abstract: Natural language processing for less popular languages is difficult, partly due to the high variations in the writing form. On the other hand, many minority languages in the same region share similar properties and can be processed in a similar way. This paper publishes an integrated multilingual language processing tool. Our aim is to provide an open, free and standard toolkit for minority language processing tasks, by a uniform user interface to support multiple languages. The present implementation supports Uyghur, Kazak, Kirghiz, three major minority languages in the Western China, and our focus was put on phonetic and morphological analysis. For the phonetic analysis, we build a multilingual parallel phoneme list, with similar phonemes grouped and character codes standardized. A multilingual syllable analyzer is also developed to detect spelling mistakes, and extract irregular spelling. For the morphological analysis, we build a multilingual morpheme segmentation tool that can extract morphemes by statistical analysis. This toolkit is extendable in terms of both functions and languages.

@@ 第39行： / 第39行： @@
-===Title: Prior-constrained multilingual speech recognition ===
+===Title: AP17-OLR Challenge: Data, Plan, and Baseline ===
-*Author: Ying Shi, Zhiyuan Tang, Dong Wang
+*Author: Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
-*Abstract: Conventional multilingual speech recognition follows ether a tandem approach (language identification)
+*Abstract: We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.
-or parallel architecture (parallel decoding). This paper presented a novel prior-constrained approach that
+Compare to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on
-conduct the decoding in a multilingual linguistic space, where a prior of the language is used to constrain
+short utterances. The data are
-the decoding frame by frame. Our experiments found that this approach can realize true simultaneous multilingual
+offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines were constructed to assist the participants,
-speech recognition.
+one is based on the i-vector model and the other is based on various neural networks.
+We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan
+and demonstrate that the combined database is a reasonable data resource for multilingual research.
+All the data are free for participants, and the Kaldi recipes for the baselines have been published online.
-===Title: Memory-based Uyghur-Chinese Translation===
+===Title: Memory-augmented Chinese-Uyghur Neural Machine Translation===
-*Author: Shiyue Zhang, Guli, Mijit Ablimit, Askar Hamdulla
+*Author: Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla
-*Abstract: Neural machine translation (NMT) has achieved significant performance. However, this NMT approach
+*Abstract: Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT).  Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.
-has not yet effectively applied to minor languages such as Uyghur to Chinese translation. The main problem here
-is that the limited training data does not support an end-to-end neural learning. In this paper, we propose to
-use a memory structure to assist the NMT inference under the condition of limited resource languages. Our experiments
-demonstrated that the this approach is highly efficient compared to the vanilla NMT, and outperforms the conventional
-statistical machine translation (SMT) approach.
-===Title: Resource construction for Mongolia ===
-*Author: Shipeng Xu, Guanyu Li, Hongzhi Yu
+===Title: Language Resource Construction for Mongolian===
+*Author: Shipeng Xu , Hongzhi Yu, Thomas Fang Zheng and Jinghao Yan
 *Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic
@@ 第66行： / 第65行： @@
 resource construction supported by the NSFC project.
-===Title: Tibetan speech database construction===
+===Title: Free Linguistic and Speech Resources for Tibetan===
-*Author: Guanyu Li, Hongzhi Yu
+*Author: Guanyu Li, Hongzhi Yu,Thomas Fang Zheng,  Jinghao Yan
+*Abstract: Tibetan is an important low-resource language in China.  A key factor that hinders the speech and language research for Tibetan is the lack of resources, particularly free ones. This paper describes our recent progression on Tibetan resource construction supported by the NSFC M2ASR project, including the phone set, lexicon, as well as the transcription of a large scale speech corpus. Following the M2ASR free data program, all the resources are publicly available and free for researchers. We also release a small Tibetan speech database that can be used to build a proto type Tibetan speech recognition system.
-*Abstract: Tibetan is an important low-resource language in China. The syllable structure of Tibetan is similar
+===Title: A Free Kazak Speech Database and a Speech Recognition Baseline===
-as Chinese, but the composition rules in orthographic forms is highly complex. Additionally, the lexicon
+*Author: Ying Shi, Askar Hamdulla, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
-resource is far from standard and rich. This paper describes our recent progression on Tibetan
-resource construction supported by the NSFC M2ASR project.
-===Title: A large Kazak speech database and a speech recognition baseline===
+*Abstract: Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese,
-*Author: Askar Hamdulla, Ying Shi
+partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages,
+however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech
+databases for minority languages, and the only few databases are held by some institutes as private properties, far from
+open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including
+phone set, lexicon, and language model.
-*Abstract: We describe the construction process of a large scale Kazak speech database. The database involves
+In this paper, we publish a speech database in Kazak, a major minority language in the western China. Accompanying this
-hours of speech signals, recorded by more than 200 speakers. A speech recognition baseline
+database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed.
+We will describe the recipe for constructing a baseline system, and report our present results.
+The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project
+supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.
-===Title: Multilingual resource construction for Uyghur, Kazak, Kirghiz languages ===
+===Title: A Multilingual Language Processing Tool for Uyghur, Kazak and Kirghiz ===
-*Author: Mijit Ablimit, Askar Hamdulla, Ying Shi,  Dong Wang
+*Author: Mijit Ablimit, Sardar Parhat, Askar Hamdulla, Thomas Fang Zheng
-*Abstract: Minority languages, especially spoken languages, are strongly influenced by major languages or mixing each other. So a platform of uniform phonetic and morphological processing methods can provide a methodology and extra resource for the less popular languages. This paper describes multi-language phonetic and morphological tools and corpus compilation processing for some resource scares languages.
+*Abstract: Natural language processing for less popular languages is difficult, partly due to the high variations in the writing form. On the other hand, many minority languages in the same region share similar properties and can be processed in a similar way. This paper publishes an integrated multilingual language processing tool. Our aim is to provide an open, free and standard toolkit for minority language processing tasks, by a uniform user interface to support multiple languages. The present implementation supports Uyghur, Kazak, Kirghiz, three major minority languages in the Western China, and our focus was put on phonetic and morphological analysis. For the phonetic analysis, we build a multilingual parallel phoneme list, with similar phonemes grouped and character codes standardized. A multilingual syllable analyzer is also developed to detect spelling mistakes, and extract irregular spelling. For the morphological analysis, we build a multilingual morpheme segmentation tool that can extract morphemes by statistical analysis. This toolkit is extendable in terms of both functions and languages.