“CN-Celeb”版本间的差异

2019年10月31日 (四) 07:20的版本

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

@@ 第8行： / 第8行： @@
 * History：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
-===Target===
+===Description===
 * Collect audio data of 1,000 Chinese celebrities.
 * Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
 * Create a benchmark database for speaker recognition community.
-===Future Plans===
-* Augment the database to 10,000 people.
-* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
 ===Basic Methods===
@@ 第38行： / 第33行： @@
 ===Publications===
+===Future Plans===
+* Augment the database to 10,000 people.
+* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
 ===References===