“CN-Celeb”版本间的差异

2019年11月14日 (四) 01:50的版本

Environments: Tensorflow, PyTorch, Keras, MxNet
Face detection and tracking: RetinaFace and ArcFace models.
Active speaker verification: SyncNet model.
Speaker diarization: UIS-RNN model.
Double check by speaker recognition: VGG model.
Input: pictures and videos of POIs (Persons of Interest).
Output: well-labelled videos of POIs (Persons of Interest).

Reports

Stage report v1.0

Publications

@misc{fan2019cnceleb,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
  year={2019},
  eprint={1911.01799},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

Source Code

Collection Pipeline: celebrity-audio-collection
Baseline Systems: kaldi-cn-celeb

Download

Local (not recommended)

wav.tgz : <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/wav.tgz>speech data[30GB]</a> info.txt : <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/info.txt>info</a> about.html : <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/about.html>about</a> index.html : <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/index.html>this file</a>

Future Plans

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

License

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

References

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

“CN-Celeb”版本间的差异

2019年11月14日 (四) 01:50的版本

目录

Introduction

Members

Description

Basic Methods

Reports

Publications

Source Code

Download

Future Plans

License

References

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具

@@ 第45行： / 第45行： @@
 ===Download===
+* Local (not recommended)
+wav.tgz      :    <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/wav.tgz>speech data[30GB]</a>
+info.txt     :    <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/info.txt>info</a>
+about.html   :    <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/about.html>about</a>
+index.html   :    <a href=http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/index.html>this file</a>
 ===Future Plans===