“CN-Celeb”版本间的差异

2019年10月29日 (二) 12:13的版本

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
Chung et al., "Out of time: automated lip sync in the wild", 2016.link
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. link
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. link

@@ 第24行： / 第24行： @@
 * Face detection and tracking: RetinaFace and ArcFace models.
 * Active speaker verification: SyncNet model.
-* Speaker Diarization: UIS-RNN model.
+* Speaker diarization: UIS-RNN model.
 * Double check by speaker recognition: VGG model.
-* Input: Pictures and videos of POIs (Persons of Interest).
+* Input: pictures and videos of POIs (Persons of Interest).
 * Output: well-labelled videos of POIs (Persons of Interest).
@@ 第46行： / 第46行： @@
 * Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
 * Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
-* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
+* Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
-* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]
+* Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]