“2024-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“{| class="wikitable" !People !! This Week !! Next Week !! Task Tracking (<font color="red">DeadLine</font>) |- |- |Dong Wang || * || * || * |- |- |Lantian Li || *...”为内容创建页面)
 
 
(17位用户的26个中间修订版本未显示)
第6行: 第6行:
 
|Dong Wang
 
|Dong Wang
 
||
 
||
*
+
* 2nd round of check for AI handbook middle school.
 +
* Deal with pictures in AI handbook (primary & middle).
 +
* Start to check AI handbook high school.
 +
* Check AI book for Tianjin medical school.
 
||
 
||
 
*
 
*
第17行: 第20行:
 
|Lantian Li
 
|Lantian Li
 
||
 
||
*
+
* Complete my CSTR Report
 +
* Go on AI-Graph EN Chapter 4
 +
* Polish 2025 Daily Sign
 
||
 
||
 
*
 
*
第28行: 第33行:
 
|Ying Shi
 
|Ying Shi
 
||
 
||
*  
+
* Design cohort- conditional chain multi-talker ASR with round-RNN
 +
** WER result :  round-1 32.15% , round-2: 69.69% round-3: 92.33%
 +
** For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
 +
* Prepare for Huawei's interview.
 
||
 
||
 
*
 
*
第39行: 第47行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
*
+
* Huawei TSE(Train models that better fit the scene)[https://z1et6d3xtb.feishu.cn/docx/AArOdQEQPoFcshxD5OfcB9SLnFg]
 
||
 
||
 
*
 
*
第49行: 第57行:
 
|Junming Yuan
 
|Junming Yuan
 
||
 
||
*
+
* Comparable results between Clean-HuBERT, Cocktail-HuBERT, and MT-HuBERT[https://z1et6d3xtb.feishu.cn/docx/YhJadT52mokvPQxfV3qcmtlwnkb]
 +
** Bad news: Cocktail-HuBERT > Clean-HuBERT > MT-HuBERT
 
||
 
||
 
*
 
*
第71行: 第80行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* Data process
 +
** CVS3 1/4 already cut from original video, waiting for pre-process
 +
** Copying pre-processed GongAn video data from gonganbu
 +
* VSR Contrastive Loss Exp
 +
** Inspired by paper [https://arxiv.org/abs/2408.11813]
 +
** Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
 +
** Result: Under training
 +
* Paper Reading
 
||
 
||
 
*
 
*
第82行: 第98行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Rebutall writing
 +
*Iterative training and inference
 +
**Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
 
||
 
||
 
*
 
*
第93行: 第111行:
 
|Pengqi Li
 
|Pengqi Li
 
||
 
||
*
+
* Begin writing paper about importance of phonemes analysis work.
 +
* Reading a doctoral thesis about speaker explainability[https://theses.hal.science/tel-04634215v1/file/These_BEN_AMOR.pdf].
 +
 
 
||
 
||
 
*
 
*
第104行: 第124行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS: all transformer
 +
** 6k spk: EER 2.6%
 +
** 20k spk: EER 2.3%
 +
** 20k spk+multi-enroll: EER 1.9%
 
||
 
||
 
*
 
*
第114行: 第137行:
 
|-
 
|-
 
|Tianhao Wang
 
|Tianhao Wang
 +
||
 +
* Experiments about query embedding conditional approach:
 +
** SDR: FiLM (7.492) > self-attention (6.573)
 
||
 
||
 
*
 
*
 +
||
 +
*
 +
|-
 +
 +
 +
|-
 +
|Xiaoxue Luo
 +
||
 +
* training of the USS(CED+AudioSep) model
 +
** adjust the audio format to meet the needs of the model(in training)
 +
* production of 2025 Daily Sign( March )
 
||
 
||
 
*
 
*
第126行: 第163行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
*Speaker identity based conditional chain proposal[https://z1et6d3xtb.feishu.cn/docx/MzZ8d3cDWokCzCx0MmDcRJDFnke]
 +
*prepare Interim Report
 
||
 
||
 
*
 
*
第137行: 第175行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* Read paper (ICCIP keynote speak paper and some other)
 +
* NS
 +
** Some tests about transformer feature extractor
 
||
 
||
 
*
 
*
第159行: 第199行:
 
|Yu Zhang
 
|Yu Zhang
 
||
 
||
*
+
* Huawei AED
 +
** data aug & human annotated dataset [https://z1et6d3xtb.feishu.cn/wiki/AO2CwQC4gioaq6k1SkkcARBAn2f]
 +
* Finance
 +
** Paper reading, reproduce local Llama version of StockAgent [https://github.com/MingyuJ666/Stockagent] (a LLM based market simulation framework)
 
||
 
||
 
*
 
*
第170行: 第213行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||
 
||
*
+
*Training of New language Models(HeNan)[https://z1et6d3xtb.feishu.cn/docx/R7uIdGnwBo69bqxXki8cn50cnfh?from=from_copylink]
 +
*Training of New language Models(ChongQing)[https://z1et6d3xtb.feishu.cn/docx/FioOdh8Uqo83oCxAJRzcGXcRnae?from=from_copylink]
 +
 
 
||
 
||
 
*
 
*
第181行: 第226行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*
+
* Fix some bugs about keyword sampling in text enroll kws training code.
 +
* Add spec augmentation for text enroll kws training.
 
||
 
||
 
*
 
*
第201行: 第247行:
 
|Turi
 
|Turi
 
||
 
||
*
+
* Paper reading
 +
* ICASSP 2025 rebuttal
 
||
 
||
 
*
 
*
第209行: 第256行:
 
|Yue Gu
 
|Yue Gu
 
||
 
||
*
+
* Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[https://z1et6d3xtb.feishu.cn/wiki/VPZfwx53ei2zkgkSvPtcCiDSnVh?from=from_copylink]
 +
* writing taslp paper
 
||
 
||
 
*
 
*
第218行: 第266行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* Finding ideal thresholds and deploying cloud services for KWS models: `zh48_guangdong` and `zh48_haining20`.
 +
* Located and fixed a bug in FunASR which may lead to segmentation fault. Built service with extended gRPC protocol.
 +
* Analysis of some AED (cries and slaps) FAs.
 
||
 
||
 
*
 
*

2024年11月25日 (一) 11:04的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • 2nd round of check for AI handbook middle school.
  • Deal with pictures in AI handbook (primary & middle).
  • Start to check AI handbook high school.
  • Check AI book for Tianjin medical school.
Lantian Li
  • Complete my CSTR Report
  • Go on AI-Graph EN Chapter 4
  • Polish 2025 Daily Sign
Ying Shi
  • Design cohort- conditional chain multi-talker ASR with round-RNN
    • WER result : round-1 32.15% , round-2: 69.69% round-3: 92.33%
    • For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
  • Prepare for Huawei's interview.
Zhenghai You
  • Huawei TSE(Train models that better fit the scene)[1]
Junming Yuan
  • Comparable results between Clean-HuBERT, Cocktail-HuBERT, and MT-HuBERT[2]
    • Bad news: Cocktail-HuBERT > Clean-HuBERT > MT-HuBERT
Chen Chen
Xiaolou Li
  • Data process
    • CVS3 1/4 already cut from original video, waiting for pre-process
    • Copying pre-processed GongAn video data from gonganbu
  • VSR Contrastive Loss Exp
    • Inspired by paper [3]
    • Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
    • Result: Under training
  • Paper Reading
Zehua Liu
  • Rebutall writing
  • Iterative training and inference
    • Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
Pengqi Li
  • Begin writing paper about importance of phonemes analysis work.
  • Reading a doctoral thesis about speaker explainability[4].
Wan Lin
  • NS: all transformer
    • 6k spk: EER 2.6%
    • 20k spk: EER 2.3%
    • 20k spk+multi-enroll: EER 1.9%
Tianhao Wang
  • Experiments about query embedding conditional approach:
    • SDR: FiLM (7.492) > self-attention (6.573)
Xiaoxue Luo
  • training of the USS(CED+AudioSep) model
    • adjust the audio format to meet the needs of the model(in training)
  • production of 2025 Daily Sign( March )
Zhenyu Zhou
  • Speaker identity based conditional chain proposal[5]
  • prepare Interim Report
Junhui Chen
  • Read paper (ICCIP keynote speak paper and some other)
  • NS
    • Some tests about transformer feature extractor
Jiaying Wang
Yu Zhang
  • Huawei AED
    • data aug & human annotated dataset [6]
  • Finance
    • Paper reading, reproduce local Llama version of StockAgent [7] (a LLM based market simulation framework)
Wenqiang Du
  • Training of New language Models(HeNan)[8]
  • Training of New language Models(ChongQing)[9]
Yang Wei
  • Fix some bugs about keyword sampling in text enroll kws training code.
  • Add spec augmentation for text enroll kws training.
Lily
Turi
  • Paper reading
  • ICASSP 2025 rebuttal
Yue Gu
  • Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[10]
  • writing taslp paper
Qi Qu
  • Finding ideal thresholds and deploying cloud services for KWS models: `zh48_guangdong` and `zh48_haining20`.
  • Located and fixed a bug in FunASR which may lead to segmentation fault. Built service with extended gRPC protocol.
  • Analysis of some AED (cries and slaps) FAs.