“ASR Status Report 2018-1-2”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第34行: 第34行:
 
|Lantian Li   
 
|Lantian Li   
 
||  
 
||  
*  
+
* Commercial deep speaker model training in process. [http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=lilt&step=view_request&cvssid=646]
 +
* Phone-aware scoring on deep speaker feature. [http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=lilt&step=view_request&cvssid=643]
 +
* Phonetic speaker embedding in process. [http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=lilt&step=view_request&cvssid=644]
 +
* Overlap training for speaker features. [http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=lilt&step=view_request&cvssid=645]
 
||
 
||
*  
+
* Commercial deep speaker model training.
 +
* Phone-aware scoring on deep speaker feature.
 +
* Phonetic speaker embedding in process.
 +
* Overlap training for speaker features.
 
||  
 
||  
 
*  
 
*  
 
|-
 
|-
 
 
 
|-
 
|-
 
|Zhiyuan Tang  
 
|Zhiyuan Tang  

2018年1月8日 (一) 04:21的版本

Date People Last Week This Week Task Tracking
2018.1.2


Miao Zhang
Ying Shi
Lantian Li
  • Commercial deep speaker model training in process. [1]
  • Phone-aware scoring on deep speaker feature. [2]
  • Phonetic speaker embedding in process. [3]
  • Overlap training for speaker features. [4]
  • Commercial deep speaker model training.
  • Phone-aware scoring on deep speaker feature.
  • Phonetic speaker embedding in process.
  • Overlap training for speaker features.
Zhiyuan Tang




Date People Last Week This Week Task Tracking
2017.12.25


Miao Zhang
  • Read the 16k model script
  • The cough recognition codes left by Xiaofei
  • check the trivial database, make it more reasonable
  • test the 16k model on the database
Ying Shi
  • some function for voice-printer
    • speaker vector per utterance here
    • speaker vector minus base speaker vector here
  • CTC for Haibo Wang (Token accuracy on train set 92.80%, on cv set 89.74%) haven't test on test set
  • QRcode
    • speaker vector merge phone grayscale here
    • speaker vector merge phone black-and-white map here
    • speaker vector merge phone black-and-white map minus base vector here
  • ivector baseline for kazak-uyghur LRE performance is 81.85% (Utt level)
  • Finish voice-checker copyright and submit the copyright in this Wednesday
Lantian Li
  • Complete the recipe for `VV_FACTOR`.
  • 16K and 8K deep speaker model comparison.[5]
  • Patent for `VV_QuickMark`.
  • Complete the demo for `VV_FACTOR`.[Assign to Shouyi Dai]
  • Phonetic speaker embedding.
  • Overlap training for speaker features.
Zhiyuan Tang
  • word level pronunciation accuracy based on likelihood (tell which word is well pronounced as '0' or badly pronounced '1')
  • model adaptation
  • if possible, an alpha version Parrot for test inside lab to collect some data for better configurature