“2024-11-11”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(13位用户的18个中间修订版本未显示)
第17行: 第17行:
 
|Lantian Li
 
|Lantian Li
 
||
 
||
*
+
* Complete all the script for the 2025 AI calendar
 +
* AI-Graph EN (32/50)
 
||
 
||
 
*
 
*
第39行: 第40行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
*
+
* Huawei project with IRA-TSE[https://z1et6d3xtb.feishu.cn/docx/R05DdrPVqoSzQYxNlhicedxenkd]
 
||
 
||
 
*
 
*
第50行: 第51行:
 
||
 
||
 
* re-check some details from Cocktail HuBERT paper and prepared the code.
 
* re-check some details from Cocktail HuBERT paper and prepared the code.
 +
**pseudo-label preparation finished.
 
* paper reading
 
* paper reading
 
||
 
||
第61行: 第63行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* Finish VTS documents with Zehua
 +
* Process the CVS3 data
 +
* Inherit the AV-HuBERT training code and debug
 
||
 
||
 
*
 
*
第72行: 第76行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Finish 2 VTS documents with Xiaolou
 +
**Financial Document
 +
**Technical Document
 +
*Paper Reading on last Friday
 
||
 
||
 
*
 
*
第83行: 第90行:
 
|Pengqi Li
 
|Pengqi Li
 
||
 
||
*
+
* Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
 +
** Conclusions still need to be further analyzed in conjunction with other databases.[https://z1et6d3xtb.feishu.cn/docx/VtlIdFxdRodp8Nx8oQjcVLC4nCd]
 
||
 
||
 
*
 
*
第94行: 第102行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS: detection
 +
** clean: 1.479% EER vs. 1.239% EER
 +
** multi: in training
 
||
 
||
 
*
 
*
第104行: 第114行:
 
|-
 
|-
 
|Tianhao Wang
 
|Tianhao Wang
 +
||
 +
* ablation study about some new approach for sound separation [https://z1et6d3xtb.feishu.cn/docx/NLlsdyLtuoptjYxjcX0cwlVbnXc]
 
||
 
||
 
*
 
*
 +
||
 +
*
 +
|-
 +
 +
 +
|-
 +
|Xiaoxue Luo
 +
||
 +
* paper reading to investigate some new approach for sound separation
 +
* retrain AudioSep with a DPRNN block(AudioSep-DP)
 
||
 
||
 
*
 
*
第116行: 第138行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
*Attemp to add silence loss during training(seems like useless)
 +
*Conditional Chain 2-5 mix results(still some bugs,the acc of speaker number is poor)[https://z1et6d3xtb.feishu.cn/docx/D2UQdxMBvojkF9xCXGfcFBLGned]
 
||
 
||
 
*
 
*
第127行: 第150行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* VAD frame level detection loss
 +
** Loss decreases faster in the early stages of training
 +
* Change test encoder: from resnet34 to transformer encoder (coding...)
 
||
 
||
 
*
 
*
第162行: 第187行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||
 
||
* Training of New Dialect Models(Cantonese)
+
* Training of New language Models(Cantonese)
 
* Prepare the PPT for the competition
 
* Prepare the PPT for the competition
 
||
 
||
第174行: 第199行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*
+
* Train text enroll KWS model with 7000h data
 
||
 
||
 
*
 
*
第194行: 第219行:
 
|Turi
 
|Turi
 
||
 
||
*
+
* kws data preparation and checking some implementations
 +
* Paper Reading about kws
 
||
 
||
 
*
 
*
第213行: 第239行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* KWS:
 +
** Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
 +
** Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
 +
** NPU demo: feature extraction + model inference.
 +
** Text-enroll method: android demo benchmark.
 
||
 
||
 
*
 
*

2024年11月11日 (一) 11:05的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Tianjian AI book (done)
Lantian Li
  • Complete all the script for the 2025 AI calendar
  • AI-Graph EN (32/50)
Ying Shi
Zhenghai You
  • Huawei project with IRA-TSE[1]
Junming Yuan
  • re-check some details from Cocktail HuBERT paper and prepared the code.
    • pseudo-label preparation finished.
  • paper reading
Xiaolou Li
  • Finish VTS documents with Zehua
  • Process the CVS3 data
  • Inherit the AV-HuBERT training code and debug
Zehua Liu
  • Finish 2 VTS documents with Xiaolou
    • Financial Document
    • Technical Document
  • Paper Reading on last Friday
Pengqi Li
  • Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
    • Conclusions still need to be further analyzed in conjunction with other databases.[2]
Wan Lin
  • NS: detection
    • clean: 1.479% EER vs. 1.239% EER
    • multi: in training
Tianhao Wang
  • ablation study about some new approach for sound separation [3]
Xiaoxue Luo
  • paper reading to investigate some new approach for sound separation
  • retrain AudioSep with a DPRNN block(AudioSep-DP)
Zhenyu Zhou
  • Attemp to add silence loss during training(seems like useless)
  • Conditional Chain 2-5 mix results(still some bugs,the acc of speaker number is poor)[4]
Junhui Chen
  • VAD frame level detection loss
    • Loss decreases faster in the early stages of training
  • Change test encoder: from resnet34 to transformer encoder (coding...)
Jiaying Wang
Yu Zhang
  • SocioDojo
    • Single stock (TSLA) investment (still running)
  • Investigate some Text guided LLM centric time-series forecaster and reproduce some of them (Time-LLM LLM-Process, AutoTimes), and some toy experiment about how prompt prefix influence the forecast result
Wenqiang Du
  • Training of New language Models(Cantonese)
  • Prepare the PPT for the competition
Yang Wei
  • Train text enroll KWS model with 7000h data
Lily
Turi
  • kws data preparation and checking some implementations
  • Paper Reading about kws
Yue Gu
  • use CosyVoice model to synthesize the target speaker utterance, which is employed as the supplement for target speaker adaptation. The adaptation exp is running.
  • icassp 2025 paper review
  • paper writing
Qi Qu
  • KWS:
    • Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
    • Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
    • NPU demo: feature extraction + model inference.
    • Text-enroll method: android demo benchmark.