“2024-09-30”版本间的差异

2024年9月30日 (一) 10:54的版本

People	This Week	Next Week	Task Tracking (DeadLine)
Dong Wang
Lantian Li	AI-Graph handbook v0.1 AI-Graph EN (12/50) Huawei TiDing 3.0 - Model Quantization BUPT/AI-Radiance trivial things
Ying Shi	Add 4 kinds of negative sampling strategies Optimized Text-enroll KWS code (deletion, substitution, insertion, and shuffle) and verify them to ensure no bugs. Find that new negative sampling will increase the difficulty of training which indicates that only depending on positional embedding is not enough. Reproduce conditional chain overlap asr (Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals) According to Jiaying's work the code released by the published paper can not work Write dominance-based conditional chain overlap asr by myself (in progress)
Zhenghai You	Exploring the role of speaker encoder in TSE Joint traing Spk Enc have better separation effect, but the EER is poor Pretrain & Freeing Spk Enc EER well, but SI-SDR is poor Further explore the different impacts of using spk aug on different tasks The generality of SPK-AUG Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
Junming Yuan
Chen Chen
Xiaolou Li	Use MFA on LRS3 to cut it into small segments Use discrete embedding of avhubert in vsp-llm training (Still training) Some idea of align video feature and LLM (Dense Connector, CL methods) Handover the data collection and get familiar with the process Data Collection: 3138 h (need to re-check, DDL: 10.15)
Zehua Liu
Pengqi Li
Wan Lin	Voxblink1 model training and testing [1]
Tianhao Wang	AudioSep reproduction problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
Xiaoxue Luo	AI-Graph High school handbook(v0.1)
Zhenyu Zhou
Junhui Chen
Jiaying Wang
Yu Zhang	Fri Report Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
Wenqiang Du	Check primary school handbook(43/45) Release chinese and haining KWS model
Yang Wei
Lily
Turi	Segmented audios in dataset into individual words. Paper reading
Yue Gu	Almost complete the revisions of my journal paper
Qi Qu	KWS Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly. AED Evaluating third-party solution of baby crying detection. Misc. Preparing for live talk.

@@ 第47行： / 第47行： @@
 |Zhenghai You
 ||
-*
+* Exploring the role of speaker encoder in TSE
+** Joint traing Spk Enc have better separation effect, but the EER is poor
+** Pretrain & Freeing Spk Enc EER well, but SI-SDR is poor
+** Further explore the different impacts of using spk aug on different tasks
+* The generality of SPK-AUG
+** Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
+**
 ||
 *

“2024-09-30”版本间的差异

2024年9月30日 (一) 10:54的版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具