“2024-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(9位用户的10个中间修订版本未显示)
第33行: 第33行:
 
|Ying Shi
 
|Ying Shi
 
||
 
||
* Design cohort- conditional chain mutli-talker ASR with round-RNN  
+
* Design cohort- conditional chain multi-talker ASR with round-RNN  
* WER result :  round-1 32.15% , round-2: 69.69 round-3: 92.33%
+
** WER result :  round-1 32.15% , round-2: 69.69% round-3: 92.33%
* For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
+
** For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
 +
* Prepare for Huawei's interview.
 
||
 
||
 
*
 
*
第46行: 第47行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
* Huawei TSE(Train models that better fit the scene)
+
* Huawei TSE(Train models that better fit the scene)[https://z1et6d3xtb.feishu.cn/docx/AArOdQEQPoFcshxD5OfcB9SLnFg]
 
||
 
||
 
*
 
*
第79行: 第80行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* Data process
 +
** CVS3 1/4 already cut from original video, waiting for pre-process
 +
** Copying pre-processed GongAn video data from gonganbu
 +
* VSR Contrastive Loss Exp
 +
** Inspired by paper [https://arxiv.org/abs/2408.11813]
 +
** Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
 +
** Result: Under training
 +
* Paper Reading
 
||
 
||
 
*
 
*
第116行: 第124行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS: all transformer
 +
** 6k spk: EER 2.6%
 +
** 20k spk: EER 2.3%
 +
** 20k spk+multi-enroll: EER 1.9%
 
||
 
||
 
*
 
*
第126行: 第137行:
 
|-
 
|-
 
|Tianhao Wang
 
|Tianhao Wang
 +
||
 +
* Experiments about query embedding conditional approach:
 +
** SDR: FiLM (7.492) > self-attention (6.573)
 
||
 
||
 
*
 
*
 +
||
 +
*
 +
|-
 +
 +
 +
|-
 +
|Xiaoxue Luo
 +
||
 +
* training of the USS(CED+AudioSep) model
 +
** adjust the audio format to meet the needs of the model(in training)
 +
* production of 2025 Daily Sign( March )
 
||
 
||
 
*
 
*
第150行: 第175行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* Read paper (ICCIP keynote speak paper and some other)
 +
* NS
 +
** Some tests about transformer feature extractor
 
||
 
||
 
*
 
*
第199行: 第226行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*
+
* Fix some bugs about keyword sampling in text enroll kws training code.
 +
* Add spec augmentation for text enroll kws training.
 
||
 
||
 
*
 
*
第228行: 第256行:
 
|Yue Gu
 
|Yue Gu
 
||
 
||
*
+
* Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[https://z1et6d3xtb.feishu.cn/wiki/VPZfwx53ei2zkgkSvPtcCiDSnVh?from=from_copylink]
 +
* writing taslp paper
 
||
 
||
 
*
 
*

2024年11月25日 (一) 11:04的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • 2nd round of check for AI handbook middle school.
  • Deal with pictures in AI handbook (primary & middle).
  • Start to check AI handbook high school.
  • Check AI book for Tianjin medical school.
Lantian Li
  • Complete my CSTR Report
  • Go on AI-Graph EN Chapter 4
  • Polish 2025 Daily Sign
Ying Shi
  • Design cohort- conditional chain multi-talker ASR with round-RNN
    • WER result : round-1 32.15% , round-2: 69.69% round-3: 92.33%
    • For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
  • Prepare for Huawei's interview.
Zhenghai You
  • Huawei TSE(Train models that better fit the scene)[1]
Junming Yuan
  • Comparable results between Clean-HuBERT, Cocktail-HuBERT, and MT-HuBERT[2]
    • Bad news: Cocktail-HuBERT > Clean-HuBERT > MT-HuBERT
Chen Chen
Xiaolou Li
  • Data process
    • CVS3 1/4 already cut from original video, waiting for pre-process
    • Copying pre-processed GongAn video data from gonganbu
  • VSR Contrastive Loss Exp
    • Inspired by paper [3]
    • Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
    • Result: Under training
  • Paper Reading
Zehua Liu
  • Rebutall writing
  • Iterative training and inference
    • Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
Pengqi Li
  • Begin writing paper about importance of phonemes analysis work.
  • Reading a doctoral thesis about speaker explainability[4].
Wan Lin
  • NS: all transformer
    • 6k spk: EER 2.6%
    • 20k spk: EER 2.3%
    • 20k spk+multi-enroll: EER 1.9%
Tianhao Wang
  • Experiments about query embedding conditional approach:
    • SDR: FiLM (7.492) > self-attention (6.573)
Xiaoxue Luo
  • training of the USS(CED+AudioSep) model
    • adjust the audio format to meet the needs of the model(in training)
  • production of 2025 Daily Sign( March )
Zhenyu Zhou
  • Speaker identity based conditional chain proposal[5]
  • prepare Interim Report
Junhui Chen
  • Read paper (ICCIP keynote speak paper and some other)
  • NS
    • Some tests about transformer feature extractor
Jiaying Wang
Yu Zhang
  • Huawei AED
    • data aug & human annotated dataset [6]
  • Finance
    • Paper reading, reproduce local Llama version of StockAgent [7] (a LLM based market simulation framework)
Wenqiang Du
  • Training of New language Models(HeNan)[8]
  • Training of New language Models(ChongQing)[9]
Yang Wei
  • Fix some bugs about keyword sampling in text enroll kws training code.
  • Add spec augmentation for text enroll kws training.
Lily
Turi
  • Paper reading
  • ICASSP 2025 rebuttal
Yue Gu
  • Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[10]
  • writing taslp paper
Qi Qu
  • Finding ideal thresholds and deploying cloud services for KWS models: `zh48_guangdong` and `zh48_haining20`.
  • Located and fixed a bug in FunASR which may lead to segmentation fault. Built service with extended gRPC protocol.
  • Analysis of some AED (cries and slaps) FAs.