|
|
(5位用户的6个中间修订版本未显示) |
第66行: |
第66行: |
| |Xiaolou Li | | |Xiaolou Li |
| || | | || |
− | * | + | * AVHuBERT unit exp |
| + | ** dc connector (↑0.8% than discrete unit) |
| + | ** concat feature and embedding (↑2% than discrete unit, ↓0.3% than baseline) |
| + | * CVS3 quality check (30h totally) [https://z1et6d3xtb.feishu.cn/drive/folder/HGHbfyCJRlLYzUdSlEicOEztnYc] |
| + | * This work is help by Zehua, Linwan, Tianhao |
| + | * MLLM system with audio output design |
| || | | || |
| * | | * |
第104行: |
第109行: |
| |Wan Lin | | |Wan Lin |
| || | | || |
− | * | + | * help VSR data verification |
| + | * experiment in voxblink2 [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink] |
| || | | || |
| * | | * |
第153行: |
第159行: |
| |Junhui Chen | | |Junhui Chen |
| || | | || |
− | * | + | * Experiments for NS |
| + | * Look for speaker detection model with Resnet34 for frame label |
| || | | || |
| * | | * |
第175行: |
第182行: |
| |Yu Zhang | | |Yu Zhang |
| || | | || |
− | * | + | * SocioDojo Llama 3.1 8B investment task |
| + | ** acc return is about 10% below nasdaq 100 index |
| || | | || |
− | * | + | * add more professional information source, such as WSJ (current is Tweets Trending, which is too entertainment-oriented) |
| + | * control the BUY/SELL amount of Actuator (current investments ratio is too high) |
| + | * reproduce other Multi Agent investment pipeline such as FinAgent or FinRobot |
| || | | || |
| * | | * |
第197行: |
第207行: |
| |Yang Wei | | |Yang Wei |
| || | | || |
− | * | + | * Train text enroll KWS model and test with Aibabel dialect data. |
| || | | || |
| * | | * |
第227行: |
第237行: |
| |Yue Gu | | |Yue Gu |
| || | | || |
− | * | + | * write the cover letter |
| + | * design a new speaker adaptation framework |
| || | | || |
| * | | * |
People |
This Week |
Next Week |
Task Tracking (DeadLine)
|
Dong Wang
|
- Primary School AI hand book (20-30)
|
|
|
Lantian Li
|
- AI-Graph EN (25/50)
- Complete CSTR intro report (11.18)
|
|
|
Ying Shi
|
- Cohort-Overlap ASR
- condition on real decode result
- Design stop criterion
- Cohort-Speech separation
- several configs for Dual-path model
- group work
|
|
|
Zhenghai You
|
|
|
|
Junming Yuan
|
- The result of feat-mask/time-mask MT-HuBERT [1]
|
|
|
Xiaolou Li
|
- AVHuBERT unit exp
- dc connector (↑0.8% than discrete unit)
- concat feature and embedding (↑2% than discrete unit, ↓0.3% than baseline)
- CVS3 quality check (30h totally) [2]
- This work is help by Zehua, Linwan, Tianhao
- MLLM system with audio output design
|
|
|
Zehua Liu
|
- Verify VSR data
- Finish Data Verification Report
- ICL work(CER: 47.87% < CER: 51.08%)
- Time Mask matters[3]
|
|
|
Pengqi Li
|
- Complete the final report of the doctoral innovation project(School)
- Exploring the Consistency of TAO and LayerCAM Results on different models and datasets.
- Conclusion and hypothesis[4]
|
|
|
Wan Lin
|
- help VSR data verification
- experiment in voxblink2 [5]
|
|
|
Tianhao Wang
|
- adjust the code of AudioSep (CLAP) to support multi-mix and audio-query (in training)
- some project testing
|
|
|
Xiaoxue Luo
|
- AudioSep reproduction
- evaluate the performance of AudioSep
- comparative experiment between AudioSep and baseline system(CLIPSep)
|
|
|
Zhenyu Zhou
|
- conditional chain 2-mix results reproduction(sisidr: 10.714 -> 15.6)
- model quantization finial version submission
|
|
|
Junhui Chen
|
- Experiments for NS
- Look for speaker detection model with Resnet34 for frame label
|
|
|
Jiaying Wang
|
|
|
|
Yu Zhang
|
- SocioDojo Llama 3.1 8B investment task
- acc return is about 10% below nasdaq 100 index
|
- add more professional information source, such as WSJ (current is Tweets Trending, which is too entertainment-oriented)
- control the BUY/SELL amount of Actuator (current investments ratio is too high)
- reproduce other Multi Agent investment pipeline such as FinAgent or FinRobot
|
|
Wenqiang Du
|
- Participated in an AI competition
|
|
|
Yang Wei
|
- Train text enroll KWS model and test with Aibabel dialect data.
|
|
|
Lily
|
|
|
|
Turi
|
- Whisper finetuning on sagalee
- with encoder frozen, whisper-large-v3 (20.5 WER)
- Finetuning LLM
- Finetuned Qwen2.5-0.5B on conversation dataset translated from English to Oromo
-
|
|
Yue Gu
|
- write the cover letter
- design a new speaker adaptation framework
|
|
|
Qi Qu
|
- AED:
- New CED-based classifiers deployed onto devices, yielding acceptable performance.
- KWS:
- Quantization and format conversion of production models for deployment on embedded device w/ NPU. Default quantization mode leads to unacceptable loss of precision. Will try hybrid quantization.
- Text-enrollment KWS: some dynamic dimensions misinterpreted as constant duration exportation to ONNX.
|
|
|