“CN-Celeb”版本间的差异

2021年1月6日 (三) 10:06的版本

Environments: Tensorflow, PyTorch, Keras, MxNet
Face detection and tracking: RetinaFace and ArcFace models.
Active speaker verification: SyncNet model.
Speaker diarization: UIS-RNN model.
Double check by speaker recognition: VGG model.
Input: pictures and videos of POIs (Persons of Interest).
Output: well-labelled videos of POIs (Persons of Interest).

Reports

Stage report v1.0

Publications

@misc{fan2019cnceleb,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
  year={2019},
  eprint={1911.01799},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

@misc{li2020cn,
  title={CN-Celeb: multi-genre speaker recognition},
  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
  year={2020},
  eprint={2012.12468},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
 }

Source Code

Collection Pipeline: celebrity-audio-collection
Baseline Systems: kaldi-cn-celeb

Download

Public (recommended)

OpenSLR: http://www.openslr.org/82/

Local (not recommended)

CSLT@Tsinghua: http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/

Future Plans

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

License

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

References

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

@@ 第1行： / 第1行： @@
-=Introduction=
+===Introduction===
-* CN-Celeb, a large-scale Chinese celebrities dataset collected `in the wild'.
+* CN-Celeb, a large-scale Chinese celebrities dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
-=Members=
+===Members===
 * Current：Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
 * History：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
-===Target===
+===Description===
 * Collect audio data of 1,000 Chinese celebrities.
-* Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
+* Automatically clip videos through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
 * Create a benchmark database for speaker recognition community.
+===Basic Methods===
+* Environments: Tensorflow, PyTorch, Keras, MxNet
+* Face detection and tracking: RetinaFace and ArcFace models.
+* Active speaker verification: SyncNet model.
+* Speaker diarization: UIS-RNN model.
+* Double check by speaker recognition: VGG model.
+* Input: pictures and videos of POIs (Persons of Interest).
+* Output: well-labelled videos of POIs (Persons of Interest).
+===Reports===
+* [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage report v1.0]
+===Publications===
+<pre>
+@misc{fan2019cnceleb,
+  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
+  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
+  year={2019},
+  eprint={1911.01799},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+}
+@misc{li2020cn,
+  title={CN-Celeb: multi-genre speaker recognition},
+  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
+  year={2020},
+  eprint={2012.12468},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+ }
+</pre>
+===Source Code===
+* Collection Pipeline: [https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
+* Baseline Systems: [https://github.com/csltstu/kaldi/tree/cnceleb/egs/cnceleb kaldi-cn-celeb]
+===Download===
+* Public (recommended)
+OpenSLR: http://www.openslr.org/82/
+* Local (not recommended)
+CSLT@Tsinghua: http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/
 ===Future Plans===
@@ 第19行： / 第66行： @@
 * Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
-===Basic method===
+===License===
-* Environments: Tensorflow, PyTorch, Keras, MxNet
+* All the resources contained in the database are free for research institutes and individuals.
-* Face detection and tracking based on RetinaFace and ArcFace models.
+* <b>No commerical usage is permitted</b>.
-* Active speaker verification based on SyncNet model.
-* Speaker Diarization based on UIS-RNN model.
-* Double check by speaker recognition based on VGG model.
-* Input: Pictures and videos of POIs (Persons of Interest).
-* Output: well-labelled videos of POIs (Persons of Interest).
-===GitHub of our project===
-[https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
-===Reports===
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage Report v1.0]
 ===References===
 * Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
 * Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
 * Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
 * Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
-* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
+* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf]
-* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
+* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf]
-* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
+* Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [https://arxiv.org/pdf/1902.10107.pdf]
-* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]
+* Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf]

“CN-Celeb”版本间的差异

2021年1月6日 (三) 10:06的版本

目录

Introduction

Members

Description

Basic Methods

Reports

Publications

Source Code

Download

Future Plans

License

References

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具