“CN-CVS”版本间的差异
来自cslt Wiki
(→Source Code) |
|||
(相同用户的一个中间修订版本未显示) | |||
第1行: | 第1行: | ||
===Introduction=== | ===Introduction=== | ||
− | * | + | * CN-CVS, a large-scale Chinese Mandarin audio-visual dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University. |
===Members=== | ===Members=== | ||
第32行: | 第32行: | ||
===Source Code=== | ===Source Code=== | ||
− | * Collection Pipeline: https://github.com/sectum1919/ | + | * Collection Pipeline: https://github.com/sectum1919/cncvs_data_collector |
* xTS: TODO | * xTS: TODO | ||
* VCA-GAN: TODO | * VCA-GAN: TODO | ||
第39行: | 第39行: | ||
* Public (recommended) | * Public (recommended) | ||
− | + | https://cloud.tsinghua.edu.cn/d/83f13126daec49deb8a3/ | |
* Local (not recommended) | * Local (not recommended) | ||
− | + | https://cloud.tsinghua.edu.cn/d/83f13126daec49deb8a3/ | |
===Future Plans=== | ===Future Plans=== | ||
* Extract text transcription via OCR & ASR & Human check | * Extract text transcription via OCR & ASR & Human check | ||
+ | * Extend baseline to benchmark | ||
===License=== | ===License=== |
2022年10月30日 (日) 11:47的最后版本
目录
Introduction
- CN-CVS, a large-scale Chinese Mandarin audio-visual dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
Members
- Current:Dong Wang, Chen Chen
Description
- Collect audio and video data of more than 2500 Mandarin speakers.
- Automatically clip videos through a pipeline including shot detection, VAD, face detection, face tracker, audio-visual synchronization detection.
- Manually annotate speaker identity, human check data quality.
- Create a benchmark database for video to speech synthesis task.
Basic Methods
- Environments: PyTorch, OpenCV, FFmpeg
- Shot detection: ffmpeg
- VAD: pydub
- Face detection and tracking: dlib.
- Audio-visual synchronization detection: SyncNet model.
- Input: json files of video information.
- Output: videos clips and wav files, as well as metadata json files.
Reports
Publications
Source Code
- Collection Pipeline: https://github.com/sectum1919/cncvs_data_collector
- xTS: TODO
- VCA-GAN: TODO
Download
- Public (recommended)
https://cloud.tsinghua.edu.cn/d/83f13126daec49deb8a3/
- Local (not recommended)
https://cloud.tsinghua.edu.cn/d/83f13126daec49deb8a3/
Future Plans
- Extract text transcription via OCR & ASR & Human check
- Extend baseline to benchmark
License
- All the resources contained in the database are free for research institutes and individuals.
- No commerical usage is permitted.