“ASR-events-OC16-details”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(2位用户的14个中间修订版本未显示)
第1行: 第1行:
 
=OC16 MixASR-CHEN Challenge=
 
=OC16 MixASR-CHEN Challenge=
  
The OC16 MixASR-CHEN challenge is part of the special session "mixlingual speech and language processing" on O-COCOSDA 2016. The challenge is a Chinese-English mixed
+
The OC16 MixASR-CHEN challenge is part of the special session "mixlingual speech processing" on O-COCOSDA 2016. The challenge is a Chinese-English mixed
 
speech recognition task, where the host and embedding languages are Chinese and English respectively.  
 
speech recognition task, where the host and embedding languages are Chinese and English respectively.  
  
 
==Data==
 
==Data==
  
The challenge requires three resources:  
+
Participants can and only can use the following three resources:  
  
===OC16-MixCHEN80===
+
===OC16-CE80===
  
OC16-MixCHEN80 is a speech database provided by SpeechOcean for this challenge. The main features involve:
+
OC16-CE80 is a speech database provided by SpeechOcean (http://www.speechocean.com) for this challenge. The main features involve:
  
* XX speakers (XX males, XX females)
+
* 1400+ speakers  
* Microphone channel
+
* Mobile channel
* XXX utterances per speaker in average, amounting to 80 hours of speech signals in total.
+
* 80 hours of speech signals
 
* Transcriptions are provided
 
* Transcriptions are provided
 +
* The licence file is [[OC16-CE80|here]]
 +
* Data profile is [[媒体文件:OC16-CE80-profile.pdf|here]]
  
 
===THCHS30===
 
===THCHS30===
第31行: 第33行:
  
  
 +
==Test plan==
 +
 +
* Only the three databases mentioned above can be used in system development.
 +
* Test data will be released on July 15. The participants are required to return the transcription generated by their systems.
 +
* CER will be computed by the organizers and returned to the participants.
  
 
==Participation rules==
 
==Participation rules==
  
* Participants of the special session OR the challenge can apply for OC16-MixCHEN80 by sending emails to the organizers (see below).
+
* Participants of both the special session and the OC16 MixASR-CHEN challenge can apply for OC16-CE80 by sending emails to the organizers (see below).
* Agreement for the usage of OC16-MixCHEN80 should be signed and returned to the organizer before the data can be downloaded.  
+
* Agreement for the usage of OC16-CE80 should be signed and returned to the organizer before the data can be downloaded.  
* Publications based on OC16-MixChen80 should cite the following paper: "Dong Wang, Xuewei Zhang, Qing Cheng, OC16-MixChen80: a Chinese-English Mixlingual database and a DNN baseline"
+
* Publications based on OC16-CE80 should cite the following paper: "Dong Wang, Zhiyuan Tang, Difei Tang, Qing Chen, OC16-CE80: a Chinese-English Mixlingual database and an ASR baseline" [http://wangd.cslt.org/public/pdf/mixlingual.pdf pdf]
  
 
==Challenge procedure==
 
==Challenge procedure==
 
   
 
   
* Jun 4, data ready for release and accept registration request.
+
* June 13, OC16-CE80 is ready and registration request is acceptable.
* July 15-17, test data release. Participants can response with their decoding results in 24 hours.
+
* July 15-17, OC16-CE80 test set release. Participants can response with their decoding results before July 17, 12:00PM, Beijing time.
 
* July 20, participants can obtain their own WER.
 
* July 20, participants can obtain their own WER.
* OC16, summary will be given on the special session.
+
* Sept. 30 OC16-CE80 extend submission deadline
 +
* OC16, summary is given on the special session.
 +
 
 +
==Extend submission==
 +
 
 +
The 'official submission' has past the due, and we received a number of good submissions. The WER results have been returned to the participants individually.
 +
 
 +
We now accept 'extend submissions'. Any participants can submit your results (or new results for participants that have sent the official submission),
 +
until 30th, Sept. We are happy to help evaluate your submissions and report your results (if you agree) as 'the results of extended submission' on the OC16 special session.
 +
 
 +
Many thanks for your participation, we look forward your new submissions and discuss this interesting topic in OC16.
 +
 
  
 
==Registration==
 
==Registration==
第50行: 第68行:
  
 
* Dr. Dong Wang (wangdong99@mails.tsinghua.edu.cn)  
 
* Dr. Dong Wang (wangdong99@mails.tsinghua.edu.cn)  
* Mr. Difei Tang (tangdifei@speechocean.com)
 
 
* Ms. Chen Qing (chenqing@speechocean.com)
 
* Ms. Chen Qing (chenqing@speechocean.com)

2016年7月28日 (四) 10:52的最后版本

OC16 MixASR-CHEN Challenge

The OC16 MixASR-CHEN challenge is part of the special session "mixlingual speech processing" on O-COCOSDA 2016. The challenge is a Chinese-English mixed speech recognition task, where the host and embedding languages are Chinese and English respectively.

Data

Participants can and only can use the following three resources:

OC16-CE80

OC16-CE80 is a speech database provided by SpeechOcean (http://www.speechocean.com) for this challenge. The main features involve:

  • 1400+ speakers
  • Mobile channel
  • 80 hours of speech signals
  • Transcriptions are provided
  • The licence file is here
  • Data profile is here

THCHS30

THCHS30 is a pure speech database provided by CSLT@Tsinghua University. All the resources of THCHS30 can be used to improve the system, especially the lexicon and LM. The data is available at:

http://www.openslr.org/18/

CMU English dictionary

To recognize English words, CMU English dictionary 0.7b is allowed to be used.

http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b


Test plan

  • Only the three databases mentioned above can be used in system development.
  • Test data will be released on July 15. The participants are required to return the transcription generated by their systems.
  • CER will be computed by the organizers and returned to the participants.

Participation rules

  • Participants of both the special session and the OC16 MixASR-CHEN challenge can apply for OC16-CE80 by sending emails to the organizers (see below).
  • Agreement for the usage of OC16-CE80 should be signed and returned to the organizer before the data can be downloaded.
  • Publications based on OC16-CE80 should cite the following paper: "Dong Wang, Zhiyuan Tang, Difei Tang, Qing Chen, OC16-CE80: a Chinese-English Mixlingual database and an ASR baseline" pdf

Challenge procedure

  • June 13, OC16-CE80 is ready and registration request is acceptable.
  • July 15-17, OC16-CE80 test set release. Participants can response with their decoding results before July 17, 12:00PM, Beijing time.
  • July 20, participants can obtain their own WER.
  • Sept. 30 OC16-CE80 extend submission deadline
  • OC16, summary is given on the special session.

Extend submission

The 'official submission' has past the due, and we received a number of good submissions. The WER results have been returned to the participants individually.

We now accept 'extend submissions'. Any participants can submit your results (or new results for participants that have sent the official submission), until 30th, Sept. We are happy to help evaluate your submissions and report your results (if you agree) as 'the results of extended submission' on the OC16 special session.

Many thanks for your participation, we look forward your new submissions and discuss this interesting topic in OC16.


Registration

If you are interested to participate the challenge, or if you have any other questions, comments, suggestions about the challenge, please send email to the organizer:

  • Dr. Dong Wang (wangdong99@mails.tsinghua.edu.cn)
  • Ms. Chen Qing (chenqing@speechocean.com)