“CN-Celeb”版本间的差异

2024年11月26日 (二) 02:14的最后版本

Environments: Tensorflow, PyTorch, Keras, MxNet
Face detection and tracking: RetinaFace and ArcFace models.
Active speaker verification: SyncNet model.
Speaker diarization: UIS-RNN model.
Double check by speaker recognition: VGG model.
Input: pictures and videos of POIs (Persons of Interest).
Output: well-labelled videos of POIs (Persons of Interest).

Reports

Stage report v1.0

Publications

@misc{fan2019cnceleb,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
  year={2019},
  eprint={1911.01799},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

@misc{li2020cn,
  title={CN-Celeb: multi-genre speaker recognition},
  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
  year={2020},
  eprint={2012.12468},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
 }

Source Code

Collection Pipeline: celebrity-audio-collection
Baseline Systems: kaldi-cn-celeb

Download

Public (recommended)

OpenSLR: http://www.openslr.org/82/

Local (not recommended)

CSLT@Tsinghua: http://index.cslt.org/~data/CN-Celeb/

Future Plans

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

License

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

References

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

@@ 第1行： / 第1行： @@
 ===Introduction===
-* CN-Celeb, a large-scale Chinese celebrities dataset collected `in the wild'.
+* CN-Celeb, a large-scale Chinese celebrities dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
 ===Members===
@@ 第8行： / 第8行： @@
 * History：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
-===Target===
+===Description===
 * Collect audio data of 1,000 Chinese celebrities.
-* Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
+* Automatically clip videos through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
 * Create a benchmark database for speaker recognition community.
-===Future Plans===
-* Augment the database to 10,000 people.
-* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
 ===Basic Methods===
@@ 第24行： / 第19行： @@
 * Face detection and tracking: RetinaFace and ArcFace models.
 * Active speaker verification: SyncNet model.
-* Speaker Diarization: UIS-RNN model.
+* Speaker diarization: UIS-RNN model.
 * Double check by speaker recognition: VGG model.
-* Input: Pictures and videos of POIs (Persons of Interest).
+* Input: pictures and videos of POIs (Persons of Interest).
 * Output: well-labelled videos of POIs (Persons of Interest).
-===GitHub of This Project===
-[https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
 ===Reports===
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage Report v1.0]
-===Download===
+* [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage report v1.0]
 ===Publications===
+<pre>
+@misc{fan2019cnceleb,
+  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
+  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
+  year={2019},
+  eprint={1911.01799},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+}
+@misc{li2020cn,
+  title={CN-Celeb: multi-genre speaker recognition},
+  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
+  year={2020},
+  eprint={2012.12468},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+ }
+</pre>
+===Source Code===
+* Collection Pipeline: [https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
+* Baseline Systems: [https://github.com/csltstu/kaldi/tree/cnceleb/egs/cnceleb kaldi-cn-celeb]
+===Download===
+* Public (recommended)
+OpenSLR: http://www.openslr.org/82/
+* Local (not recommended)
+CSLT@Tsinghua: http://index.cslt.org/~data/CN-Celeb/
+===Future Plans===
+* Augment the database to 10,000 people.
+* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
+===License===
+* All the resources contained in the database are free for research institutes and individuals.
+* <b>No commerical usage is permitted</b>.
 ===References===
 * Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
 * Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
 * Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
 * Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
-* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
+* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf]
-* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
+* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf]
-* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
+* Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [https://arxiv.org/pdf/1902.10107.pdf]
-* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]
+* Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf]

“CN-Celeb”版本间的差异

2024年11月26日 (二) 02:14的最后版本

目录

Introduction

Members

Description

Basic Methods

Reports

Publications

Source Code

Download

Future Plans

License

References

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具