“CN-Celeb”版本间的差异
来自cslt Wiki
(以“=CN-Celeb= * A large-scale Chinese celebrities dataset collected `in the wild'. * Members:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang * Historical Memb...”为内容创建页面) |
|||
第1行: | 第1行: | ||
− | = | + | =Introduction= |
− | * | + | * CN-Celeb, a large-scale Chinese celebrities dataset collected `in the wild'. |
− | * | + | |
− | * | + | =Members= |
+ | |||
+ | * Current:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang | ||
+ | * History:Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang | ||
===Target=== | ===Target=== | ||
第9行: | 第12行: | ||
* Collect audio data of 1,000 Chinese celebrities. | * Collect audio data of 1,000 Chinese celebrities. | ||
* Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization. | * Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization. | ||
− | * Create a database. | + | * Create a benchmark database for speaker recognition community. |
− | === | + | ===Future Plans=== |
* Augment the database to 10,000 people. | * Augment the database to 10,000 people. | ||
* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them. | * Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them. | ||
+ | ===Basic method=== | ||
− | + | * Environments: Tensorflow, PyTorch, Keras, MxNet | |
+ | * Face detection and tracking based on RetinaFace and ArcFace models. | ||
+ | * Active speaker verification based on SyncNet model. | ||
+ | * Speaker Diarization based on UIS-RNN model. | ||
+ | * Double check by speaker recognition based on VGG model. | ||
+ | * Input: Pictures and videos of POIs (Persons of Interest). | ||
+ | * Output: well-labelled videos of POIs (Persons of Interest). | ||
− | + | ===GitHub of our project=== | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | === | + | |
[https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection] | [https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection] | ||
− | === | + | ===Reports=== |
− | [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf v1. | + | [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage Report v1.0] |
− | + | ||
− | + | ||
− | === | + | ===References=== |
* Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf] | * Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf] | ||
* Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698] | * Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698] |
2019年10月29日 (二) 12:06的版本
目录
Introduction
- CN-Celeb, a large-scale Chinese celebrities dataset collected `in the wild'.
Members
- Current:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
- History:Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
Target
- Collect audio data of 1,000 Chinese celebrities.
- Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
- Create a benchmark database for speaker recognition community.
Future Plans
- Augment the database to 10,000 people.
- Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
Basic method
- Environments: Tensorflow, PyTorch, Keras, MxNet
- Face detection and tracking based on RetinaFace and ArcFace models.
- Active speaker verification based on SyncNet model.
- Speaker Diarization based on UIS-RNN model.
- Double check by speaker recognition based on VGG model.
- Input: Pictures and videos of POIs (Persons of Interest).
- Output: well-labelled videos of POIs (Persons of Interest).
GitHub of our project
Reports
References
- Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
- Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
- Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
- Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
- Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
- Chung et al., "Out of time: automated lip sync in the wild", 2016.link
- Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. link
- Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link