“2024-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第80行: 第80行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* Data process
 +
** CVS3 1/4 already cut from original video, waiting for pre-process
 +
** Copying pre-processed GongAn video data from gonganbu
 +
* VSR Contrastive Loss Exp
 +
** Inspired by paper [https://arxiv.org/abs/2408.11813]
 +
** Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
 +
** Result: Under training
 +
* Paper Reading
 
||
 
||
 
*
 
*

2024年11月25日 (一) 10:58的版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • 2nd round of check for AI handbook middle school.
  • Deal with pictures in AI handbook (primary & middle).
  • Start to check AI handbook high school.
  • Check AI book for Tianjin medical school.
Lantian Li
  • Complete my CSTR Report
  • Go on AI-Graph EN Chapter 4
  • Polish 2025 Daily Sign
Ying Shi
  • Design cohort- conditional chain multi-talker ASR with round-RNN
    • WER result : round-1 32.15% , round-2: 69.69% round-3: 92.33%
    • For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
  • Prepare for Huawei's interview.
Zhenghai You
  • Huawei TSE(Train models that better fit the scene)[1]
Junming Yuan
  • Comparable results between Clean-HuBERT, Cocktail-HuBERT, and MT-HuBERT[2]
    • Bad news: Cocktail-HuBERT > Clean-HuBERT > MT-HuBERT
Chen Chen
Xiaolou Li
  • Data process
    • CVS3 1/4 already cut from original video, waiting for pre-process
    • Copying pre-processed GongAn video data from gonganbu
  • VSR Contrastive Loss Exp
    • Inspired by paper [3]
    • Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
    • Result: Under training
  • Paper Reading
Zehua Liu
  • Rebutall writing
  • Iterative training and inference
    • Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
Pengqi Li
  • Begin writing paper about importance of phonemes analysis work.
  • Reading a doctoral thesis about speaker explainability[4].
Wan Lin
Tianhao Wang
  • Experiments about query embedding conditional approach:
    • SDR: FiLM (7.492) > self-attention (6.573)
Zhenyu Zhou
  • Speaker identity based conditional chain proposal[5]
  • prepare Interim Report
Junhui Chen
Jiaying Wang
Yu Zhang
  • Huawei AED
    • data aug & human annotated dataset [6]
  • Finance
    • Paper reading, reproduce local Llama version of StockAgent [7] (a LLM based market simulation framework)
Wenqiang Du
  • Training of New language Models(HeNan)[8]
  • Training of New language Models(ChongQing)[9]
Yang Wei
  • Fix some bugs about keyword sampling in text enroll kws training code.
  • Add spec augmentation for text enroll kws training.
Lily
Turi
  • Paper reading
  • ICASSP 2025 rebuttal
Yue Gu
  • Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[10]
  • writing taslp paper
Qi Qu
  • Finding ideal thresholds and deploying cloud services for KWS models: `zh48_guangdong` and `zh48_haining20`.
  • Located and fixed a bug in FunASR which may lead to segmentation fault. Built service with extended gRPC protocol.
  • Analysis of some AED (cries and slaps) FAs.