“2024-10-21”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(8位用户的10个中间修订版本未显示)
第66行: 第66行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* AVHuBERT unit exp
 +
** dc connector (↑0.8% than discrete unit)
 +
** concat feature and embedding (↑2% than discrete unit, ↓0.3% than baseline)
 +
* CVS3 quality check (30h totally) [https://z1et6d3xtb.feishu.cn/drive/folder/HGHbfyCJRlLYzUdSlEicOEztnYc]
 +
* This work is help by Zehua, Linwan, Tianhao
 +
* MLLM system with audio output design
 
||
 
||
 
*
 
*
第77行: 第82行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Verify VSR data
 +
*Finish Data Verification Report
 +
*ICL work(CER: 47.87% < CER: 51.08%)
 +
*Time Mask matters[https://z1et6d3xtb.feishu.cn/docx/JBsidACDVojhCaxFQLbcCVbsnAc?from=from_copylink]
 
||
 
||
 
*
 
*
第101行: 第109行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* help VSR data verification
 +
* experiment in voxblink2 [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink]
 
||
 
||
 
*
 
*
第138行: 第147行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
*conditional chain 2-mix results reproduction(sisidr: 10.714 -> 15.6)
 +
*model quantization finial version submission
 
||
 
||
 
*
 
*
第149行: 第159行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* Experiments for NS
 +
* Look for speaker detection model with Resnet34 for frame label
 
||
 
||
 
*
 
*
第171行: 第182行:
 
|Yu Zhang
 
|Yu Zhang
 
||
 
||
*
+
* SocioDojo Llama 3.1 8B investment task
 +
** acc return is about 10% below nasdaq 100 index
 
||
 
||
*
+
* add more professional information source, such as WSJ (current is Tweets Trending, which is too entertainment-oriented)
 +
* control the BUY/SELL amount of Actuator (current investments ratio is too high)
 +
* reproduce other Multi Agent investment pipeline such as FinAgent or FinRobot
 
||
 
||
 
*
 
*
第193行: 第207行:
 
|Yang Wei
 
|Yang Wei
 
||
 
||
*
+
* Train text enroll KWS model and test with Aibabel dialect data.
 
||
 
||
 
*
 
*
第216行: 第230行:
 
** with encoder frozen, whisper-large-v3 (20.5 WER)
 
** with encoder frozen, whisper-large-v3 (20.5 WER)
 
* Finetuning LLM
 
* Finetuning LLM
** Finetuned Qwen2.5-0.5B
+
** Finetuned Qwen2.5-0.5B on conversation dataset translated from English to Oromo
||
+
 
*
 
*
 
||
 
||
第224行: 第237行:
 
|Yue Gu
 
|Yue Gu
 
||
 
||
*
+
* write the cover letter
 +
* design a new speaker adaptation framework
 
||
 
||
 
*
 
*
第233行: 第247行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* AED:
 +
** New CED-based classifiers deployed onto devices, yielding acceptable performance.
 +
* KWS:
 +
** Quantization and format conversion of production models for deployment on embedded device w/ NPU. Default quantization mode leads to unacceptable loss of precision. Will try hybrid quantization.
 +
** Text-enrollment KWS: some dynamic dimensions misinterpreted as constant duration exportation to ONNX.
 
||
 
||
 
*
 
*

2024年10月21日 (一) 11:01的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Primary School AI hand book (20-30)
Lantian Li
  • AI-Graph EN (25/50)
  • Complete CSTR intro report (11.18)
Ying Shi
  • Cohort-Overlap ASR
    • condition on real decode result
    • Design stop criterion
  • Cohort-Speech separation
    • several configs for Dual-path model
  • group work
Zhenghai You
  • Weekly report
Junming Yuan
  • The result of feat-mask/time-mask MT-HuBERT [1]
Xiaolou Li
  • AVHuBERT unit exp
    • dc connector (↑0.8% than discrete unit)
    • concat feature and embedding (↑2% than discrete unit, ↓0.3% than baseline)
  • CVS3 quality check (30h totally) [2]
  • This work is help by Zehua, Linwan, Tianhao
  • MLLM system with audio output design
Zehua Liu
  • Verify VSR data
  • Finish Data Verification Report
  • ICL work(CER: 47.87% < CER: 51.08%)
  • Time Mask matters[3]
Pengqi Li
  • Complete the final report of the doctoral innovation project(School)
  • Exploring the Consistency of TAO and LayerCAM Results on different models and datasets.
    • Conclusion and hypothesis[4]
Wan Lin
  • help VSR data verification
  • experiment in voxblink2 [5]
Tianhao Wang
  • adjust the code of AudioSep (CLAP) to support multi-mix and audio-query (in training)
  • some project testing
Xiaoxue Luo
  • AudioSep reproduction
    • evaluate the performance of AudioSep
    • comparative experiment between AudioSep and baseline system(CLIPSep)
      • adjusting the code
Zhenyu Zhou
  • conditional chain 2-mix results reproduction(sisidr: 10.714 -> 15.6)
  • model quantization finial version submission
Junhui Chen
  • Experiments for NS
  • Look for speaker detection model with Resnet34 for frame label
Jiaying Wang
Yu Zhang
  • SocioDojo Llama 3.1 8B investment task
    • acc return is about 10% below nasdaq 100 index
  • add more professional information source, such as WSJ (current is Tweets Trending, which is too entertainment-oriented)
  • control the BUY/SELL amount of Actuator (current investments ratio is too high)
  • reproduce other Multi Agent investment pipeline such as FinAgent or FinRobot
Wenqiang Du
  • Participated in an AI competition
Yang Wei
  • Train text enroll KWS model and test with Aibabel dialect data.
Lily
Turi
  • Whisper finetuning on sagalee
    • with encoder frozen, whisper-large-v3 (20.5 WER)
  • Finetuning LLM
    • Finetuned Qwen2.5-0.5B on conversation dataset translated from English to Oromo
Yue Gu
  • write the cover letter
  • design a new speaker adaptation framework
Qi Qu
  • AED:
    • New CED-based classifiers deployed onto devices, yielding acceptable performance.
  • KWS:
    • Quantization and format conversion of production models for deployment on embedded device w/ NPU. Default quantization mode leads to unacceptable loss of precision. Will try hybrid quantization.
    • Text-enrollment KWS: some dynamic dimensions misinterpreted as constant duration exportation to ONNX.