“2024-11-11”版本间的差异

2024年11月11日 (一) 11:05的最后版本

People	This Week	Next Week	Task Tracking (DeadLine)
Dong Wang	Tianjian AI book (done)
Lantian Li	Complete all the script for the 2025 AI calendar AI-Graph EN (32/50)
Ying Shi
Zhenghai You	Huawei project with IRA-TSE[1]
Junming Yuan	re-check some details from Cocktail HuBERT paper and prepared the code. pseudo-label preparation finished. paper reading
Xiaolou Li	Finish VTS documents with Zehua Process the CVS3 data Inherit the AV-HuBERT training code and debug
Zehua Liu	Finish 2 VTS documents with Xiaolou Financial Document Technical Document Paper Reading on last Friday
Pengqi Li	Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%). Conclusions still need to be further analyzed in conjunction with other databases.[2]
Wan Lin	NS: detection clean: 1.479% EER vs. 1.239% EER multi: in training
Tianhao Wang	ablation study about some new approach for sound separation [3]
Xiaoxue Luo	paper reading to investigate some new approach for sound separation retrain AudioSep with a DPRNN block(AudioSep-DP)
Zhenyu Zhou	Attemp to add silence loss during training（seems like useless） Conditional Chain 2-5 mix results（still some bugs，the acc of speaker number is poor）[4]
Junhui Chen	VAD frame level detection loss Loss decreases faster in the early stages of training Change test encoder: from resnet34 to transformer encoder (coding...)
Jiaying Wang
Yu Zhang	SocioDojo Single stock (TSLA) investment (still running) Investigate some Text guided LLM centric time-series forecaster and reproduce some of them (Time-LLM LLM-Process, AutoTimes), and some toy experiment about how prompt prefix influence the forecast result
Wenqiang Du	Training of New language Models(Cantonese) Prepare the PPT for the competition
Yang Wei	Train text enroll KWS model with 7000h data
Lily
Turi	kws data preparation and checking some implementations Paper Reading about kws
Yue Gu	use CosyVoice model to synthesize the target speaker utterance, which is employed as the supplement for target speaker adaptation. The adaptation exp is running. icassp 2025 paper review paper writing
Qi Qu	KWS: Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed. Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2. NPU demo: feature extraction + model inference. Text-enroll method: android demo benchmark.

@@ 第17行： / 第17行： @@
 |Lantian Li
 ||
-*
+* Complete all the script for the 2025 AI calendar
+* AI-Graph EN (32/50)
 ||
 *
@@ 第39行： / 第40行： @@
 |Zhenghai You
 ||
-*
+* Huawei project with IRA-TSE[https://z1et6d3xtb.feishu.cn/docx/R05DdrPVqoSzQYxNlhicedxenkd]
 ||
 *
@@ 第50行： / 第51行： @@
 ||
 * re-check some details from Cocktail HuBERT paper and prepared the code.
+**pseudo-label preparation finished.
 * paper reading
 ||
@@ 第61行： / 第63行： @@
 |Xiaolou Li
 ||
-*
+* Finish VTS documents with Zehua
+* Process the CVS3 data
+* Inherit the AV-HuBERT training code and debug
 ||
 *
@@ 第86行： / 第90行： @@
 |Pengqi Li
 ||
-*
+* Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
+** Conclusions still need to be further analyzed in conjunction with other databases.[https://z1et6d3xtb.feishu.cn/docx/VtlIdFxdRodp8Nx8oQjcVLC4nCd]
 ||
 *
@@ 第97行： / 第102行： @@
 |Wan Lin
 ||
-*
+* NS: detection
+** clean: 1.479% EER vs. 1.239% EER
+** multi: in training
 ||
 *
@@ 第143行： / 第150行： @@
 |Junhui Chen
 ||
-*
+* VAD frame level detection loss
+** Loss decreases faster in the early stages of training
+* Change test encoder: from resnet34 to transformer encoder (coding...)
 ||
 *
@@ 第178行： / 第187行： @@
 |Wenqiang Du
 ||
-* Training of New Dialect Models(Cantonese)
+* Training of New language Models(Cantonese)
 * Prepare the PPT for the competition
 ||
@@ 第190行： / 第199行： @@
 |Yang Wei
 ||
-*
+* Train text enroll KWS model with 7000h data
 ||
 *
@@ 第230行： / 第239行： @@
 |Qi Qu
 ||
-*
+* KWS:
+** Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
+** Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
+** NPU demo: feature extraction + model inference.
+** Text-enroll method: android demo benchmark.
 ||
 *

“2024-11-11”版本间的差异

2024年11月11日 (一) 11:05的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具