Datasets
Overview of all training datasets and evaluation benchmarks used by UniFace models.
Quick Reference
| Task | Dataset | Scale | Models |
|---|---|---|---|
| Detection | WIDER FACE | 32K images | RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face |
| Recognition | MS1MV2 | 5.8M images, 85.7K IDs | MobileFace, SphereFace |
| Recognition | WebFace600K | 600K images | ArcFace |
| Recognition | WebFace4M / WebFace12M | 4M / 12M images | AdaFace |
| Gaze | Gaze360 | 238 subjects | MobileGaze |
| Parsing | CelebAMask-HQ | 30K images | BiSeNet |
| Attributes | CelebA | 200K images | AgeGender |
| Attributes | FairFace | Balanced demographics | FairFace |
| Attributes | AffectNet | Emotion labels | Emotion |
Training Datasets
Face Detection
WIDER FACE
Large-scale face detection benchmark with images across 61 event categories. Contains faces with a high degree of variability in scale, pose, occlusion, expression, and illumination.
| Property | Value |
|---|---|
| Images | ~32,000 (train/val/test split) |
| Faces | ~394,000 annotated |
| Subsets | Easy, Medium, Hard |
| Used by | RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face |
Download & References
Paper: WIDER FACE: A Face Detection Benchmark
**Download**: [http://shuoyang1213.me/WIDERFACE/](http://shuoyang1213.me/WIDERFACE/)
Face Recognition
MS1MV2
Refined version of the MS-Celeb-1M dataset, cleaned by InsightFace. Widely used for training face recognition models.
| Property | Value |
|---|---|
| Identities | 85.7K |
| Images | 5.8M |
| Format | Aligned and cropped to 112x112 |
| Used by | MobileFace, SphereFace |
Download
Kaggle (aligned 112x112): ms1m-arcface-dataset (from InsightFace)
**Training code**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)
WebFace600K
Medium-scale face recognition dataset from the WebFace series.
| Property | Value |
|---|---|
| Images | ~600K |
| Used by | ArcFace |
Source
Origin: InsightFace
**Paper**: [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
WebFace4M / WebFace12M
Large-scale face recognition datasets from the WebFace260M collection. Used for training AdaFace models with adaptive quality-aware margin.
| Property | WebFace4M | WebFace12M |
|---|---|---|
| Images | ~4M | ~12M |
| Used by | AdaFace IR_18 | AdaFace IR_101 |
Source
Paper: AdaFace: Quality Adaptive Margin for Face Recognition
**Original code**: [mk-minchul/AdaFace](https://github.com/mk-minchul/AdaFace)
CASIA-WebFace
Smaller-scale face recognition dataset suitable for academic research and lighter training runs.
| Property | Value |
|---|---|
| Identities | 10.6K |
| Images | 491K |
| Format | Aligned and cropped to 112x112 |
| Used by | Alternative training set |
Download
Kaggle (aligned 112x112): webface-112x112 (from OpenSphere)
VGGFace2
Large-scale dataset with wide variations in pose, age, illumination, ethnicity, and profession.
| Property | Value |
|---|---|
| Identities | 8.6K |
| Images | 3.1M |
| Format | Aligned and cropped to 112x112 |
| Used by | Alternative training set |
Download
Kaggle (aligned 112x112): vggface2-112x112 (from OpenSphere)
Gaze Estimation
Gaze360
Large-scale gaze estimation dataset collected in indoor and outdoor environments with diverse head poses and wide gaze ranges (up to 360 degrees).
| Property | Value |
|---|---|
| Subjects | 238 |
| Environment | Indoor and outdoor |
| Used by | All MobileGaze models |
Download & Preprocessing
Download: gaze360.csail.mit.edu/download.php
**Preprocessing**: [GazeHub - Gaze360](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#gaze360)
UniFace Models
All MobileGaze models shipped with UniFace are trained exclusively on Gaze360 for 200 epochs.
Dataset structure:
MPIIFaceGaze
Dataset for appearance-based gaze estimation from laptop webcam images of participants during everyday laptop usage. Supported by the gaze estimation training code but not used for the UniFace pretrained weights.
| Property | Value |
|---|---|
| Subjects | 15 |
| Environment | Everyday laptop usage |
| Used by | Supported (not used for UniFace weights) |
Download & Preprocessing
Download: MPIIFaceGaze download page
**Preprocessing**: [GazeHub - MPIIFaceGaze](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#mpiifacegaze)
Dataset structure:
Face Parsing
CelebAMask-HQ
High-quality face parsing dataset with pixel-level annotations for 19 facial component classes.
| Property | Value |
|---|---|
| Images | 30,000 |
| Classes | 19 facial components |
| Resolution | High quality |
| Used by | BiSeNet (ResNet18, ResNet34) |
Source
GitHub: switchablenorms/CelebAMask-HQ
**Training code**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)
Dataset structure:
dataset/
├── images/ # Input face images
│ ├── image1.jpg
│ └── ...
└── labels/ # Segmentation masks
├── image1.png
└── ...
Attribute Analysis
CelebA
Large-scale face attributes dataset widely used for training age and gender prediction models.
| Property | Value |
|---|---|
| Images | ~200K |
| Attributes | 40 binary attributes |
| Used by | AgeGender |
Reference
Paper: Deep Learning Face Attributes in the Wild
FairFace
Face attribute dataset designed for balanced representation across race, gender, and age groups. Provides more equitable predictions compared to imbalanced datasets.
| Property | Value |
|---|---|
| Attributes | Race (7), Gender (2), Age Group (9) |
| Used by | FairFace |
| License | CC BY 4.0 |
Reference
Paper: FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age
**ONNX inference**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx)
AffectNet
Large-scale facial expression dataset for emotion recognition training.
| Property | Value |
|---|---|
| Classes | 7 or 8 (Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry + Contempt) |
| Used by | Emotion (AFFECNET7, AFFECNET8) |
Reference
Paper: AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
Evaluation Benchmarks
Face Detection
WIDER FACE Validation Set
The standard benchmark for face detection models. Results are reported across three difficulty subsets.
| Subset | Criteria |
|---|---|
| Easy | Large, clear, unoccluded faces |
| Medium | Moderate scale and occlusion |
| Hard | Small, heavily occluded, or challenging faces |
See Model Zoo - Detection for per-model accuracy on each subset.
Face Recognition
Recognition models are evaluated across multiple benchmarks. Aligned 112x112 validation datasets are available as a single download.
Download
Kaggle: agedb-30-calfw-cplfw-lfw-aligned-112x112
| Benchmark | Description | Used by |
|---|---|---|
| LFW | Labeled Faces in the Wild - standard face verification benchmark | ArcFace, MobileFace, SphereFace |
| CALFW | Cross-Age LFW - face verification across age gaps | MobileFace, SphereFace |
| CPLFW | Cross-Pose LFW - face verification across pose variations | MobileFace, SphereFace |
| AgeDB-30 | Age database with 30-year age gaps | ArcFace, MobileFace, SphereFace |
| CFP-FP | Celebrities in Frontal-Profile - frontal vs. profile verification | ArcFace |
| IJB-B | IARPA Janus Benchmark B - TAR@FAR=0.01% | AdaFace |
| IJB-C | IARPA Janus Benchmark C - TAR@FAR=1e-4 | AdaFace, ArcFace |
See Model Zoo - Recognition for per-model accuracy on each benchmark.
Gaze Estimation
| Benchmark | Metric | Description |
|---|---|---|
| Gaze360 test set | MAE (degrees) | Mean Absolute Error in gaze angle prediction |
See Model Zoo - Gaze for per-model MAE scores.
Training Repositories
For training your own models or reproducing results, see the following repositories:
| Task | Repository | Datasets Supported |
|---|---|---|
| Detection | yakhyo/retinaface-pytorch | WIDER FACE |
| Recognition | yakhyo/face-recognition | MS1MV2, CASIA-WebFace, VGGFace2 |
| Gaze | yakhyo/gaze-estimation | Gaze360, MPIIFaceGaze |
| Parsing | yakhyo/face-parsing | CelebAMask-HQ |