“2024-10-28”版本间的差异
来自cslt Wiki
(12位用户的16个中间修订版本未显示) | |||
第6行: | 第6行: | ||
|Dong Wang | |Dong Wang | ||
|| | || | ||
− | * | + | * AI primary book done |
|| | || | ||
* | * | ||
第17行: | 第17行: | ||
|Lantian Li | |Lantian Li | ||
|| | || | ||
− | * | + | * AI-Graph EN (1-20 finalized) |
+ | * Design 2025 Daily Posts | ||
|| | || | ||
* | * | ||
第28行: | 第29行: | ||
|Ying Shi | |Ying Shi | ||
|| | || | ||
− | * | + | * revise the code about cohort-overlap asr [the training is in progress] |
+ | ** Support arbitrary source mixing training | ||
+ | ** Use the real hypothesis as condition by Token error rate | ||
+ | ** Design stop criterion | ||
|| | || | ||
* | * | ||
第39行: | 第43行: | ||
|Zhenghai You | |Zhenghai You | ||
|| | || | ||
− | * | + | * Introduce more hard samples to improve model performance[https://z1et6d3xtb.feishu.cn/docx/CURxdy3tEorxkrxtjjqcdMaYnJg] |
+ | ** SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases | ||
+ | ** Design more hard samples | ||
|| | || | ||
* | * | ||
第72行: | 第78行: | ||
|Xiaolou Li | |Xiaolou Li | ||
|| | || | ||
− | * | + | * VTS with LLM structure design and baseline code writing [https://z1et6d3xtb.feishu.cn/docx/ZBnOdEMxgo8bs5xrkb1cPZnCnQg?from=from_copylink] |
|| | || | ||
* | * | ||
第97行: | 第103行: | ||
|Pengqi Li | |Pengqi Li | ||
|| | || | ||
− | * | + | *Consistency of TAO and LayerCAM |
+ | ** Change TAO from input to final conv layer and obtain more consistency.(Aishell:0.93 in any model) | ||
|| | || | ||
* | * | ||
第108行: | 第115行: | ||
|Wan Lin | |Wan Lin | ||
|| | || | ||
− | * | + | * NS: downsampling is not useful [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink] |
+ | * share speaker meeting in Friday | ||
|| | || | ||
* | * | ||
第119行: | 第127行: | ||
|Tianhao Wang | |Tianhao Wang | ||
|| | || | ||
− | * AudioSep (CLAP) 5-mix exps: | + | * AudioSep (CLAP) 5-mix exps[https://z1et6d3xtb.feishu.cn/docx/DlR8dZRdEoZIwIxTOFvcQdbGnqg]: |
** text-query: SDR=4.978, SI-SDR=1.972 | ** text-query: SDR=4.978, SI-SDR=1.972 | ||
** audio-query: SDR=6.907, SI-SDR=5.058 | ** audio-query: SDR=6.907, SI-SDR=5.058 | ||
第146行: | 第154行: | ||
|Zhenyu Zhou | |Zhenyu Zhou | ||
|| | || | ||
− | * | + | *reproduce 5-mix speech Separation results: |
+ | **pit:2-mix:16.04 ;5-mix:6.87 | ||
+ | **conditional:5-mix:5.38(40 epoch) | ||
|| | || | ||
* | * | ||
第157行: | 第167行: | ||
|Junhui Chen | |Junhui Chen | ||
|| | || | ||
− | * | + | * NS:speaker detection (method survey & debug) |
+ | * get sick | ||
|| | || | ||
* | * | ||
第204行: | 第215行: | ||
|Yang Wei | |Yang Wei | ||
|| | || | ||
− | * | + | * Train text enroll KWS model with Aibabel training data. Not work. |
|| | || | ||
* | * | ||
第224行: | 第235行: | ||
|Turi | |Turi | ||
|| | || | ||
− | * | + | * Whisper-largev3 finetuning |
+ | ** Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER | ||
|| | || | ||
* | * | ||
第232行: | 第244行: | ||
|Yue Gu | |Yue Gu | ||
|| | || | ||
− | * seek | + | * seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues. |
|| | || | ||
* | * | ||
第241行: | 第253行: | ||
|Qi Qu | |Qi Qu | ||
|| | || | ||
− | * | + | * KWS: |
+ | ** Text-enroll models exported to ONNX. | ||
+ | ** C/JNI libs built based on ONNX models and ready for on-device test. | ||
|| | || | ||
* | * |
2024年10月28日 (一) 10:59的最后版本
People | This Week | Next Week | Task Tracking (DeadLine) |
---|---|---|---|
Dong Wang |
|
|
|
Lantian Li |
|
|
|
Ying Shi |
|
|
|
Zhenghai You |
|
|
|
Junming Yuan |
|
|
|
Chen Chen |
|
|
|
Xiaolou Li |
|
|
|
Zehua Liu |
|
|
|
Pengqi Li |
|
|
|
Wan Lin |
|
|
|
Tianhao Wang |
|
|
|
Xiaoxue Luo |
|
|
|
Zhenyu Zhou |
|
|
|
Junhui Chen |
|
|
|
Jiaying Wang |
|
|
|
Yu Zhang |
|
|
|
Wenqiang Du |
|
|
|
Yang Wei |
|
|
|
Lily |
|
|
|
Turi |
|
|
|
Yue Gu |
|
|
|
Qi Qu |
|
|
|