“2024-10-28”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(15位用户的21个中间修订版本未显示)
第6行: 第6行:
 
|Dong Wang
 
|Dong Wang
 
||
 
||
*
+
* AI primary book done
 
||
 
||
 
*
 
*
第17行: 第17行:
 
|Lantian Li
 
|Lantian Li
 
||
 
||
*
+
* AI-Graph EN (1-20 finalized)
 +
* Design 2025 Daily Posts
 
||
 
||
 
*
 
*
第28行: 第29行:
 
|Ying Shi
 
|Ying Shi
 
||
 
||
*  
+
* revise the code about cohort-overlap asr [the training is in progress]
 +
** Support arbitrary source mixing training
 +
** Use the real hypothesis as condition by Token error rate
 +
** Design stop criterion
 
||
 
||
 
*
 
*
第39行: 第43行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
*
+
* Introduce more hard samples to improve model performance[https://z1et6d3xtb.feishu.cn/docx/CURxdy3tEorxkrxtjjqcdMaYnJg]
 +
** SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases
 +
** Design more hard samples
 
||
 
||
 
*
 
*
第72行: 第78行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* VTS with LLM structure design and baseline code writing [https://z1et6d3xtb.feishu.cn/docx/ZBnOdEMxgo8bs5xrkb1cPZnCnQg?from=from_copylink]
 
||
 
||
 
*
 
*
第83行: 第89行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Reading Papper about In-Context-Learning in ASR
 +
*Training model with Adaptive Time Mask
 +
*Try In-Context-Learning with only previous sentence[https://z1et6d3xtb.feishu.cn/docx/JBsidACDVojhCaxFQLbcCVbsnAc?from=from_copylink]
 +
*VTS Project Report starts
 
||
 
||
 
*
 
*
第94行: 第103行:
 
|Pengqi Li
 
|Pengqi Li
 
||
 
||
*
+
*Consistency of TAO and LayerCAM
 +
** Change TAO from input to final conv layer and obtain more consistency.(Aishell:0.93 in any model)
 
||
 
||
 
*
 
*
第105行: 第115行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS: downsampling is not useful [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink]
 +
* share speaker meeting in Friday
 
||
 
||
 
*
 
*
第116行: 第127行:
 
|Tianhao Wang
 
|Tianhao Wang
 
||
 
||
* AudioSep (CLAP) 5-mix exps:
+
* AudioSep (CLAP) 5-mix exps[https://z1et6d3xtb.feishu.cn/docx/DlR8dZRdEoZIwIxTOFvcQdbGnqg]:
 
** text-query: SDR=4.978, SI-SDR=1.972
 
** text-query: SDR=4.978, SI-SDR=1.972
 
** audio-query: SDR=6.907, SI-SDR=5.058
 
** audio-query: SDR=6.907, SI-SDR=5.058
第143行: 第154行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
*reproduce 5-mix speech Separation results:
 +
**pit:2-mix:16.04 ;5-mix:6.87
 +
**conditional:5-mix:5.38(40 epoch)
 
||
 
||
 
*
 
*
第154行: 第167行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* NS:speaker detection (method survey & debug)
 +
* get sick
 
||
 
||
 
*
 
*
第176行: 第190行:
 
|Yu Zhang
 
|Yu Zhang
 
||
 
||
*
+
* SocioDojo (still worse than Nasdaq100 baseline)
 +
** Change information sources, from the perspective of the report generated by LLM, more new information sources will be referenced.
 +
** Prompt Actuator to consider current cash ratio before investing (with out this, the asset ratio goes up to 100%, which leads to high risks, still running)
 +
* Read some papers about integrating time series into LLM
 
||
 
||
 
*
 
*
第187行: 第204行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||
 
||
*
+
* Prepare data,code and environment for Pro.Mijiti
 
||
 
||
 
*
 
*
第198行: 第215行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*
+
* Train text enroll KWS model with Aibabel training data. Not work.
 
||
 
||
 
*
 
*
第218行: 第235行:
 
|Turi
 
|Turi
 
||
 
||
*
+
* Whisper-largev3 finetuning
 +
** Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER
 
||
 
||
 
*
 
*
第226行: 第244行:
 
|Yue Gu
 
|Yue Gu
 
||
 
||
*
+
* seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues.
 
||
 
||
 
*
 
*
第235行: 第253行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* KWS:
 +
** Text-enroll models exported to ONNX.
 +
** C/JNI libs built based on ONNX models and ready for on-device test.
 
||
 
||
 
*
 
*

2024年10月28日 (一) 10:59的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • AI primary book done
Lantian Li
  • AI-Graph EN (1-20 finalized)
  • Design 2025 Daily Posts
Ying Shi
  • revise the code about cohort-overlap asr [the training is in progress]
    • Support arbitrary source mixing training
    • Use the real hypothesis as condition by Token error rate
    • Design stop criterion
Zhenghai You
  • Introduce more hard samples to improve model performance[1]
    • SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases
    • Design more hard samples
Junming Yuan
  • The result of time-mask MT-HuBERT [2]
    • A sad news
Chen Chen
Xiaolou Li
  • VTS with LLM structure design and baseline code writing [3]
Zehua Liu
  • Reading Papper about In-Context-Learning in ASR
  • Training model with Adaptive Time Mask
  • Try In-Context-Learning with only previous sentence[4]
  • VTS Project Report starts
Pengqi Li
  • Consistency of TAO and LayerCAM
    • Change TAO from input to final conv layer and obtain more consistency.(Aishell:0.93 in any model)
Wan Lin
  • NS: downsampling is not useful [5]
  • share speaker meeting in Friday
Tianhao Wang
  • AudioSep (CLAP) 5-mix exps[6]:
    • text-query: SDR=4.978, SI-SDR=1.972
    • audio-query: SDR=6.907, SI-SDR=5.058
    • This results with the loudness limitation
  • AudioSep (CLAP) without loudness limitation
  • Project things
Xiaoxue Luo
  • Comparative experiment between AudioSep and baseline system(CLIPSep)
  • Prepare the report
Zhenyu Zhou
  • reproduce 5-mix speech Separation results:
    • pit:2-mix:16.04 ;5-mix:6.87
    • conditional:5-mix:5.38(40 epoch)
Junhui Chen
  • NS:speaker detection (method survey & debug)
  • get sick
Jiaying Wang
Yu Zhang
  • SocioDojo (still worse than Nasdaq100 baseline)
    • Change information sources, from the perspective of the report generated by LLM, more new information sources will be referenced.
    • Prompt Actuator to consider current cash ratio before investing (with out this, the asset ratio goes up to 100%, which leads to high risks, still running)
  • Read some papers about integrating time series into LLM
Wenqiang Du
  • Prepare data,code and environment for Pro.Mijiti
Yang Wei
  • Train text enroll KWS model with Aibabel training data. Not work.
Lily
Turi
  • Whisper-largev3 finetuning
    • Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER
Yue Gu
  • seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues.
Qi Qu
  • KWS:
    • Text-enroll models exported to ONNX.
    • C/JNI libs built based on ONNX models and ready for on-device test.