“CN-Celeb”版本间的差异

2019年10月31日 (四) 07:29的版本

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

@@ 第11行： / 第11行： @@
 * Collect audio data of 1,000 Chinese celebrities.
-* Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
+* Automatically clip videos through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
 * Create a benchmark database for speaker recognition community.
@@ 第25行： / 第25行： @@
 ===GitHub of This Project===
 [https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
 ===Reports===
 [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage report v1.0]
@@ 第45行： / 第47行： @@
 ===References===
 * Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
 * Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]