“2024-10-14”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(6位用户的9个中间修订版本未显示)
第6行: 第6行:
 
|Dong Wang
 
|Dong Wang
 
||
 
||
 +
* AI handbook high-education version, experiment booklet
 
* Check AI primary school handbook (1-20)
 
* Check AI primary school handbook (1-20)
 
 
||
 
||
 
*
 
*
第18行: 第18行:
 
|Lantian Li
 
|Lantian Li
 
||
 
||
*
+
* AI-Graph EN (20/50)
 +
* Prepare CSTR intro report
 
||
 
||
 
*
 
*
第81行: 第82行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* AV-HuBERT discrete unit training (wer: ↓1.5-3%)
 +
** rethink how to prove the advantage or disadvantage of discrete unit?
 +
* Dense connector experiments (in training)
 +
* Double check the data of existing 3000h data in CVS2
 +
* Paper reading (discrete unit, VTS)
 
||
 
||
*
+
* Design a experiment to explain the performance of discrete unit
 +
* Finish data double check
 +
* Try to establish a simple VTS system based on our VSR system
 
||
 
||
 
*
 
*
第92行: 第99行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*Av-Hubert as Encoder performe very bad(cer:80%)
+
*Av-Hubert(Frozen) as Encoder performe very bad(cer:80%)[https://z1et6d3xtb.feishu.cn/docx/JBsidACDVojhCaxFQLbcCVbsnAc?from=from_copylink]
 
**after finetune maybe better ,but still bad
 
**after finetune maybe better ,but still bad
 
*Qwen-14B perform better(47%) than Qwen-7B(50%)
 
*Qwen-14B perform better(47%) than Qwen-7B(50%)
第98行: 第105行:
 
** maybe i will get result very soon
 
** maybe i will get result very soon
 
||
 
||
*
+
*verify collected data with XiaoLou
 +
*finish VTS data Acceptance report
 
||
 
||
 
*
 
*
第119行: 第127行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS
 +
** poster
 +
** data preparing and processing
 +
** adjust the training code
 
||
 
||
 
*
 
*
第130行: 第141行:
 
|Tianhao Wang
 
|Tianhao Wang
 
||
 
||
*
+
* CLIPSep exps for 2-mix and 5-mix [https://z1et6d3xtb.feishu.cn/docx/DnJgdwtNhotEpIxH7zfcksETnte]
 +
** 2-mix(whole vggsound, 300 classes): SDR-mix = -1.1748, SDR-separate = 5.0145
 +
** 5-mix(50 classes of vggsound): SDR-mix = -11.4529, SDR-separate = -0.4764
 
||
 
||
 
*
 
*
第154行: 第167行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
*Model quantization version2
 +
*Multi-talker mix data preparation
 
||
 
||
 
*
 
*
第165行: 第179行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* Prepare vb2 data
 +
** Too many utterances for training (out of memory), thinking a smart way to divide them.
 
||
 
||
 
*
 
*

2024年10月14日 (一) 11:02的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • AI handbook high-education version, experiment booklet
  • Check AI primary school handbook (1-20)
Lantian Li
  • AI-Graph EN (20/50)
  • Prepare CSTR intro report
Ying Shi
  • Finish Text enroll keywords spotting code & document and deliver to Wei & Du
  • Cohort Overlap ASR code v0.0
    • code has finished and training has been done
  • Cohort Speech separation code v0.0
    • code has finished training is in progress
  • here
Zhenghai You
  • Exploring the role of speaker encoder in TSE and generality of SPK-AUG[1]
Junming Yuan
  • MT-Hubert exp[2]:
    • codebook set + infoNCE ---> FC+softmax+CE / FC+sigmoid+BCE
      • To reduce the learning rate can work.
    • verified the feat-mask MT-Hubert with different lr
    • time-mask MT-Hubert verification (in progress)
Chen Chen
Xiaolou Li
  • AV-HuBERT discrete unit training (wer: ↓1.5-3%)
    • rethink how to prove the advantage or disadvantage of discrete unit?
  • Dense connector experiments (in training)
  • Double check the data of existing 3000h data in CVS2
  • Paper reading (discrete unit, VTS)
  • Design a experiment to explain the performance of discrete unit
  • Finish data double check
  • Try to establish a simple VTS system based on our VSR system
Zehua Liu
  • Av-Hubert(Frozen) as Encoder performe very bad(cer:80%)[3]
    • after finetune maybe better ,but still bad
  • Qwen-14B perform better(47%) than Qwen-7B(50%)
  • Finish In-Context-Learning code and is training
    • maybe i will get result very soon
  • verify collected data with XiaoLou
  • finish VTS data Acceptance report
Pengqi Li
  • Evaluate TAO and LayerCAM(verification) reliability.
    • Exploring the Consistency of TAO and LayerCAM Results on different models and datasets.
Wan Lin
  • NS
    • poster
    • data preparing and processing
    • adjust the training code
Tianhao Wang
  • CLIPSep exps for 2-mix and 5-mix [4]
    • 2-mix(whole vggsound, 300 classes): SDR-mix = -1.1748, SDR-separate = 5.0145
    • 5-mix(50 classes of vggsound): SDR-mix = -11.4529, SDR-separate = -0.4764
Xiaoxue Luo
  • Paper reading about sound separation
  • AudioSep reproduction
    • Training time is too long -> replace with a small dataset(in training)
Zhenyu Zhou
  • Model quantization version2
  • Multi-talker mix data preparation
Junhui Chen
  • Prepare vb2 data
    • Too many utterances for training (out of memory), thinking a smart way to divide them.
Jiaying Wang
Yu Zhang
  • SocioDojo Llama version
    • news integration is adjusted once every 12 hours
    • wikipedia & google search is banned
Wenqiang Du
  • Check the data from past training models and update the KWS model again(Model testing)
    • Chinese, Cantonese, Minnan, Haining and Uyghur
Yang Wei
  • Train text enroll KWS model with updated code (in progress)
Lily
Turi
  • Whisper model finetuning[5]
Yue Gu
  • revise the TASLP paper
  • read several papers about accent and prosody
Qi Qu
  • AED: classifiers retrained w/ new method (suppression on negative stimuli) and improvement attested.