“2025-03-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(16位用户的20个中间修订版本未显示)
第6行: 第6行:
 
|Dong Wang
 
|Dong Wang
 
||
 
||
*  
+
 
 +
* Revise AI textbook of the colleage version
  
 
||
 
||
第18行: 第19行:
 
|Lantian Li
 
|Lantian Li
 
||
 
||
*  
+
* Submit the high school textbook
 +
* Proofreading of the EN book (3/4)
 
||
 
||
 
*
 
*
第43行: 第45行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
*  
+
* Training IRA TSE for noisy enroll situation[https://z1et6d3xtb.feishu.cn/wiki/OXubwl2fIip91vkYsgMc1duhnLd]
 
||
 
||
 
*
 
*
第53行: 第55行:
 
|Junming Yuan
 
|Junming Yuan
 
||
 
||
*  
+
* Pretraining work:
 +
** MT-HuBERT & Cocktail-HuBERT will be finished next week.
 +
** Get a set of comparable finetuning results(15/5/3-shot) for each pretrain model at the 400K training step.[https://z1et6d3xtb.feishu.cn/docx/ElAKdh07GoD8qKxGFLfc3seAnOh]
 +
* Check and add reference for AI junior high school handbook(1/2).(Done)
 
||
 
||
 
*
 
*
第64行: 第69行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*  
+
* Writing NFSC document
 +
* VSR training (1500 h) already have some result
 +
** cnvsrc-single valid 300: 29.47%
 +
** cnvsrc-multi valid: 31.60%
 +
** webVideo valid: 15.54%
 +
* Finished producing pseudo-label for CVS3(4000h)
 
||
 
||
 
*  
 
*  
第75行: 第85行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Writing NFSC document
 +
*Lora finetune VLM(both Encoder and LLM Decoder) result seem not very well(maybe need parameeter adjustment)
 +
*Pretrained VSR Encoder + VLM(Decoder) seems better than Normal LM
 
||
 
||
*
+
*Design VTS architecture and implement it
 
||
 
||
 
*
 
*
第98行: 第110行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*  
+
* Supply NS experiments [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink]
 +
* Help xiaochen reproduce the diarization SV method
 
||
 
||
 
*
 
*
第109行: 第122行:
 
|Tianhao Wang
 
|Tianhao Wang
 
||
 
||
*  
+
* 3-mix training: CLAPSep baseline: SDR=5.560; Ours: SDR=6.574.
 +
* subset data training (in progress)
 
||
 
||
 
*
 
*
第120行: 第134行:
 
|Xiaoxue Luo
 
|Xiaoxue Luo
 
||
 
||
*  
+
* Sound separation
 +
** baseline: change the code of AudioSep so that its audio mixing method during training is the same as our method
 +
* paper reading and sharing in last Friday
 
||
 
||
 
*
 
*
第142行: 第158行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* speaker diarization baseline for NS (mix test: baseline EER 15.972% -> 12.983%) others still testing...
 +
* make ppt about scaling law on speaker volume.
 
||
 
||
 
*
 
*
第163行: 第180行:
 
|-
 
|-
 
|Yu Zhang
 
|Yu Zhang
 +
||
 +
* Multi Agent Investment
 +
** use Top 31 stocks in 11 sector to do portfolio for better correlation with input news (no excess return)
 +
** analysis the trading decision
 +
* Huawei AED
 +
** smallest model to keep AUC excess 0.9
 +
** split inference into two phase (Phase 1: Human Voice vs None Human Voice, Phase 2: Speech vs Other Human Voice) with two smaller model
 
||
 
||
 
*  
 
*  
||
 
*
 
 
||
 
||
 
*
 
*
第175行: 第197行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||
 
||
*  
+
* Check Primary handbook V3.0(Done)
 +
** Add reference(80%)
 
||
 
||
 
*
 
*
第186行: 第209行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*  
+
* Adapt text enroll kws model with synthesized dialect data.(recall: 83% -> 94%)[https://z1et6d3xtb.feishu.cn/docx/WFBJdF3D0o6w6bxHCJBcn9DIndg]
 
||
 
||
 
*
 
*
第196行: 第219行:
 
|Turi
 
|Turi
 
||
 
||
*  
+
* Finetuned Llama3 on Oromo text (pretrain)
 +
* Experiment to use it as LM for ASR failed, 100%+ WER
 
||
 
||
 
*  
 
*  
第204行: 第228行:
 
|Yue Gu
 
|Yue Gu
 
||
 
||
*  
+
* a 0.4% CER reduction has achieved for one spk, but no improvement was discovered on other spks. I'm still do some exps.
 +
* restart the synthetic-data related exps, try to fill the gap between synthetic data and real data on the output distribution of model.
 
||
 
||
 
*
 
*
第213行: 第238行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* Technical investigation on Visual Event Detection.
 +
* Experiment on annotating and auditing audio with Audio LLM: insufficient VRAM; poor I/O in CPU/GPU hybrid mode.
 
||
 
||
 
*
 
*

2025年3月10日 (一) 10:59的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Revise AI textbook of the colleage version
Lantian Li
  • Submit the high school textbook
  • Proofreading of the EN book (3/4)
Ying Shi
  • Compare Ascend and Nvidia
    • Performance: Clean ASR task 20epochs WER 6.91% : 7.02% (Ascend vs Nvidia)
    • Speed: Nvidia is one time faster than Ascend
  • Start think about my thesis
Zhenghai You
  • Training IRA TSE for noisy enroll situation[1]
Junming Yuan
  • Pretraining work:
    • MT-HuBERT & Cocktail-HuBERT will be finished next week.
    • Get a set of comparable finetuning results(15/5/3-shot) for each pretrain model at the 400K training step.[2]
  • Check and add reference for AI junior high school handbook(1/2).(Done)
Xiaolou Li
  • Writing NFSC document
  • VSR training (1500 h) already have some result
    • cnvsrc-single valid 300: 29.47%
    • cnvsrc-multi valid: 31.60%
    • webVideo valid: 15.54%
  • Finished producing pseudo-label for CVS3(4000h)
Zehua Liu
  • Writing NFSC document
  • Lora finetune VLM(both Encoder and LLM Decoder) result seem not very well(maybe need parameeter adjustment)
  • Pretrained VSR Encoder + VLM(Decoder) seems better than Normal LM
  • Design VTS architecture and implement it
Pengqi Li
  • Prepare the AI course for Tsinghua University Junior High School.
  • Add references to the handbook(junior high school version 1/2)(Done).
Wan Lin
  • Supply NS experiments [3]
  • Help xiaochen reproduce the diarization SV method
Tianhao Wang
  • 3-mix training: CLAPSep baseline: SDR=5.560; Ours: SDR=6.574.
  • subset data training (in progress)
Xiaoxue Luo
  • Sound separation
    • baseline: change the code of AudioSep so that its audio mixing method during training is the same as our method
  • paper reading and sharing in last Friday
Zhenyu Zhou
Junhui Chen
  • speaker diarization baseline for NS (mix test: baseline EER 15.972% -> 12.983%) others still testing...
  • make ppt about scaling law on speaker volume.
Jiaying Wang
Yu Zhang
  • Multi Agent Investment
    • use Top 31 stocks in 11 sector to do portfolio for better correlation with input news (no excess return)
    • analysis the trading decision
  • Huawei AED
    • smallest model to keep AUC excess 0.9
    • split inference into two phase (Phase 1: Human Voice vs None Human Voice, Phase 2: Speech vs Other Human Voice) with two smaller model
Wenqiang Du
  • Check Primary handbook V3.0(Done)
    • Add reference(80%)
Yang Wei
  • Adapt text enroll kws model with synthesized dialect data.(recall: 83% -> 94%)[4]
Turi
  • Finetuned Llama3 on Oromo text (pretrain)
  • Experiment to use it as LM for ASR failed, 100%+ WER
Yue Gu
  • a 0.4% CER reduction has achieved for one spk, but no improvement was discovered on other spks. I'm still do some exps.
  • restart the synthetic-data related exps, try to fill the gap between synthetic data and real data on the output distribution of model.
Qi Qu
  • Technical investigation on Visual Event Detection.
  • Experiment on annotating and auditing audio with Audio LLM: insufficient VRAM; poor I/O in CPU/GPU hybrid mode.