“2025-03-10”版本间的差异

2025年3月10日 (一) 10:59的最后版本

People	This Week	Next Week
Dong Wang	Revise AI textbook of the colleage version
Lantian Li	Submit the high school textbook Proofreading of the EN book (3/4)
Ying Shi	Compare Ascend and Nvidia Performance: Clean ASR task 20epochs WER 6.91% : 7.02% (Ascend vs Nvidia) Speed: Nvidia is one time faster than Ascend Start think about my thesis
Zhenghai You	Training IRA TSE for noisy enroll situation[1]
Junming Yuan	Pretraining work: MT-HuBERT & Cocktail-HuBERT will be finished next week. Get a set of comparable finetuning results(15/5/3-shot) for each pretrain model at the 400K training step.[2] Check and add reference for AI junior high school handbook(1/2).(Done)
Xiaolou Li	Writing NFSC document VSR training (1500 h) already have some result cnvsrc-single valid 300: 29.47% cnvsrc-multi valid: 31.60% webVideo valid: 15.54% Finished producing pseudo-label for CVS3(4000h)
Zehua Liu	Writing NFSC document Lora finetune VLM(both Encoder and LLM Decoder) result seem not very well(maybe need parameeter adjustment) Pretrained VSR Encoder + VLM(Decoder) seems better than Normal LM	Design VTS architecture and implement it
Pengqi Li	Prepare the AI course for Tsinghua University Junior High School. Add references to the handbook(junior high school version 1/2)(Done).
Wan Lin	Supply NS experiments [3] Help xiaochen reproduce the diarization SV method
Tianhao Wang	3-mix training: CLAPSep baseline: SDR=5.560; Ours: SDR=6.574. subset data training (in progress)
Xiaoxue Luo	Sound separation baseline: change the code of AudioSep so that its audio mixing method during training is the same as our method paper reading and sharing in last Friday
Zhenyu Zhou
Junhui Chen	speaker diarization baseline for NS (mix test: baseline EER 15.972% -> 12.983%) others still testing... make ppt about scaling law on speaker volume.
Jiaying Wang
Yu Zhang	Multi Agent Investment use Top 31 stocks in 11 sector to do portfolio for better correlation with input news (no excess return) analysis the trading decision Huawei AED smallest model to keep AUC excess 0.9 split inference into two phase (Phase 1: Human Voice vs None Human Voice, Phase 2: Speech vs Other Human Voice) with two smaller model
Wenqiang Du	Check Primary handbook V3.0（Done） Add reference(80%)
Yang Wei	Adapt text enroll kws model with synthesized dialect data.(recall: 83% -> 94%)[4]
Turi	Finetuned Llama3 on Oromo text (pretrain) Experiment to use it as LM for ASR failed, 100%+ WER
Yue Gu	a 0.4% CER reduction has achieved for one spk, but no improvement was discovered on other spks. I'm still do some exps. restart the synthetic-data related exps, try to fill the gap between synthetic data and real data on the output distribution of model.
Qi Qu	Technical investigation on Visual Event Detection. Experiment on annotating and auditing audio with Audio LLM: insufficient VRAM; poor I/O in CPU/GPU hybrid mode.

@@ 第6行： / 第6行： @@
 |Dong Wang
 ||
-*
+* Revise AI textbook of the colleage version
 ||
@@ 第18行： / 第19行： @@
 |Lantian Li
 ||
-*
+* Submit the high school textbook
+* Proofreading of the EN book (3/4)
 ||
 *
@@ 第43行： / 第45行： @@
 |Zhenghai You
 ||
-*
+* Training IRA TSE for noisy enroll situation[https://z1et6d3xtb.feishu.cn/wiki/OXubwl2fIip91vkYsgMc1duhnLd]
 ||
 *
@@ 第53行： / 第55行： @@
 |Junming Yuan
 ||
-*
+* Pretraining work:
+** MT-HuBERT & Cocktail-HuBERT will be finished next week.
+** Get a set of comparable finetuning results(15/5/3-shot) for each pretrain model at the 400K training step.[https://z1et6d3xtb.feishu.cn/docx/ElAKdh07GoD8qKxGFLfc3seAnOh]
+* Check and add reference for AI junior high school handbook(1/2).(Done)
 ||
 *
@@ 第64行： / 第69行： @@
 |Xiaolou Li
 ||
-*
+* Writing NFSC document
+* VSR training (1500 h) already have some result
+** cnvsrc-single valid 300: 29.47%
+** cnvsrc-multi valid: 31.60%
+** webVideo valid: 15.54%
+* Finished producing pseudo-label for CVS3(4000h)
 ||
 *
@@ 第75行： / 第85行： @@
 |Zehua Liu
 ||
-*
+*Writing NFSC document
+*Lora finetune VLM(both Encoder and LLM Decoder) result seem not very well(maybe need parameeter adjustment)
+*Pretrained VSR Encoder + VLM(Decoder) seems better than Normal LM
 ||
-*
+*Design VTS architecture and implement it
 ||
 *
@@ 第98行： / 第110行： @@
 |Wan Lin
 ||
-*
+* Supply NS experiments [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink]
+* Help xiaochen reproduce the diarization SV method
 ||
 *
@@ 第109行： / 第122行： @@
 |Tianhao Wang
 ||
-*
+* 3-mix training: CLAPSep baseline: SDR=5.560; Ours: SDR=6.574.
+* subset data training (in progress)
 ||
 *
@@ 第120行： / 第134行： @@
 |Xiaoxue Luo
 ||
-*
+* Sound separation
+** baseline: change the code of AudioSep so that its audio mixing method during training is the same as our method
+* paper reading and sharing in last Friday
 ||
 *
@@ 第142行： / 第158行： @@
 |Junhui Chen
 ||
-*
+* speaker diarization baseline for NS (mix test: baseline EER 15.972% -> 12.983%) others still testing...
+* make ppt about scaling law on speaker volume.
 ||
 *
@@ 第163行： / 第180行： @@
 |-
 |Yu Zhang
+||
+* Multi Agent Investment
+** use Top 31 stocks in 11 sector to do portfolio for better correlation with input news (no excess return)
+** analysis the trading decision
+* Huawei AED
+** smallest model to keep AUC excess 0.9
+** split inference into two phase (Phase 1: Human Voice vs None Human Voice, Phase 2: Speech vs Other Human Voice) with two smaller model
 ||
 *
-||
-*
 ||
 *
@@ 第175行： / 第197行： @@
 |Wenqiang Du
 ||
-*
+*  Check Primary handbook V3.0（Done）
+** Add reference(80%)
 ||
 *
@@ 第186行： / 第209行： @@
 |Yang Wei
 ||
-*
+* Adapt text enroll kws model with synthesized dialect data.(recall: 83% -> 94%)[https://z1et6d3xtb.feishu.cn/docx/WFBJdF3D0o6w6bxHCJBcn9DIndg]
 ||
 *
@@ 第196行： / 第219行： @@
 |Turi
 ||
-*
+* Finetuned Llama3 on Oromo text (pretrain)
+* Experiment to use it as LM for ASR failed, 100%+ WER
 ||
 *
@@ 第204行： / 第228行： @@
 |Yue Gu
 ||
-*
+* a 0.4% CER reduction has achieved for one spk, but no improvement was discovered on other spks. I'm still do some exps.
+* restart the synthetic-data related exps, try to fill the gap between synthetic data and real data on the output distribution of model.
 ||
 *
@@ 第213行： / 第238行： @@
 |Qi Qu
 ||
-*
+* Technical investigation on Visual Event Detection.
+* Experiment on annotating and auditing audio with Audio LLM: insufficient VRAM; poor I/O in CPU/GPU hybrid mode.
 ||
 *

“2025-03-10”版本间的差异

2025年3月10日 (一) 10:59的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具