“CN-Celeb”版本间的差异
来自cslt Wiki
第24行: | 第24行: | ||
* Face detection and tracking: RetinaFace and ArcFace models. | * Face detection and tracking: RetinaFace and ArcFace models. | ||
* Active speaker verification: SyncNet model. | * Active speaker verification: SyncNet model. | ||
− | * Speaker | + | * Speaker diarization: UIS-RNN model. |
* Double check by speaker recognition: VGG model. | * Double check by speaker recognition: VGG model. | ||
− | * Input: | + | * Input: pictures and videos of POIs (Persons of Interest). |
* Output: well-labelled videos of POIs (Persons of Interest). | * Output: well-labelled videos of POIs (Persons of Interest). | ||
第46行: | 第46行: | ||
* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link] | * Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link] | ||
* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link] | * Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link] | ||
− | * Xie et al., " | + | * Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [https://arxiv.org/pdf/1902.10107.pdf link] |
− | * Zhang1 et al., " | + | * Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link] |
2019年10月29日 (二) 12:13的版本
目录
Introduction
- CN-Celeb, a large-scale Chinese celebrities dataset collected `in the wild'.
Members
- Current:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
- History:Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
Target
- Collect audio data of 1,000 Chinese celebrities.
- Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
- Create a benchmark database for speaker recognition community.
Future Plans
- Augment the database to 10,000 people.
- Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
Basic Methods
- Environments: Tensorflow, PyTorch, Keras, MxNet
- Face detection and tracking: RetinaFace and ArcFace models.
- Active speaker verification: SyncNet model.
- Speaker diarization: UIS-RNN model.
- Double check by speaker recognition: VGG model.
- Input: pictures and videos of POIs (Persons of Interest).
- Output: well-labelled videos of POIs (Persons of Interest).
GitHub of This Project
Reports
Download
Publications
References
- Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
- Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
- Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
- Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
- Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
- Chung et al., "Out of time: automated lip sync in the wild", 2016.link
- Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. link
- Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. link