Skip to content

Datasets

Overview of all training datasets and evaluation benchmarks used by UniFace models.


Quick Reference

Task Dataset Scale Models
Detection WIDER FACE 32K images RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face
Recognition MS1MV2 5.8M images, 85.7K IDs MobileFace, SphereFace
Recognition WebFace600K 600K images ArcFace
Recognition WebFace4M / WebFace12M 4M / 12M images AdaFace
Gaze Gaze360 238 subjects MobileGaze
Parsing CelebAMask-HQ 30K images BiSeNet
Attributes CelebA 200K images AgeGender
Attributes FairFace Balanced demographics FairFace
Attributes AffectNet Emotion labels Emotion

Training Datasets

Face Detection

WIDER FACE

Large-scale face detection benchmark with images across 61 event categories. Contains faces with a high degree of variability in scale, pose, occlusion, expression, and illumination.

Property Value
Images ~32,000 (train/val/test split)
Faces ~394,000 annotated
Subsets Easy, Medium, Hard
Used by RetinaFace, SCRFD, YOLOv5-Face, YOLOv8-Face

Download & References

Paper: WIDER FACE: A Face Detection Benchmark

**Download**: [http://shuoyang1213.me/WIDERFACE/](http://shuoyang1213.me/WIDERFACE/)

Face Recognition

MS1MV2

Refined version of the MS-Celeb-1M dataset, cleaned by InsightFace. Widely used for training face recognition models.

Property Value
Identities 85.7K
Images 5.8M
Format Aligned and cropped to 112x112
Used by MobileFace, SphereFace

Download

Kaggle (aligned 112x112): ms1m-arcface-dataset (from InsightFace)

**Training code**: [yakhyo/face-recognition](https://github.com/yakhyo/face-recognition)

WebFace600K

Medium-scale face recognition dataset from the WebFace series.

Property Value
Images ~600K
Used by ArcFace

Source

Origin: InsightFace

**Paper**: [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)

WebFace4M / WebFace12M

Large-scale face recognition datasets from the WebFace260M collection. Used for training AdaFace models with adaptive quality-aware margin.

Property WebFace4M WebFace12M
Images ~4M ~12M
Used by AdaFace IR_18 AdaFace IR_101

Source

Paper: AdaFace: Quality Adaptive Margin for Face Recognition

**Original code**: [mk-minchul/AdaFace](https://github.com/mk-minchul/AdaFace)

CASIA-WebFace

Smaller-scale face recognition dataset suitable for academic research and lighter training runs.

Property Value
Identities 10.6K
Images 491K
Format Aligned and cropped to 112x112
Used by Alternative training set

Download

Kaggle (aligned 112x112): webface-112x112 (from OpenSphere)


VGGFace2

Large-scale dataset with wide variations in pose, age, illumination, ethnicity, and profession.

Property Value
Identities 8.6K
Images 3.1M
Format Aligned and cropped to 112x112
Used by Alternative training set

Download

Kaggle (aligned 112x112): vggface2-112x112 (from OpenSphere)


Gaze Estimation

Gaze360

Large-scale gaze estimation dataset collected in indoor and outdoor environments with diverse head poses and wide gaze ranges (up to 360 degrees).

Property Value
Subjects 238
Environment Indoor and outdoor
Used by All MobileGaze models

Download & Preprocessing

Download: gaze360.csail.mit.edu/download.php

**Preprocessing**: [GazeHub - Gaze360](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#gaze360)

UniFace Models

All MobileGaze models shipped with UniFace are trained exclusively on Gaze360 for 200 epochs.

Dataset structure:

data/
└── Gaze360/
    ├── Image/
    └── Label/

MPIIFaceGaze

Dataset for appearance-based gaze estimation from laptop webcam images of participants during everyday laptop usage. Supported by the gaze estimation training code but not used for the UniFace pretrained weights.

Property Value
Subjects 15
Environment Everyday laptop usage
Used by Supported (not used for UniFace weights)

Download & Preprocessing

Download: MPIIFaceGaze download page

**Preprocessing**: [GazeHub - MPIIFaceGaze](https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#mpiifacegaze)

Dataset structure:

data/
└── MPIIFaceGaze/
    ├── Image/
    └── Label/

Face Parsing

CelebAMask-HQ

High-quality face parsing dataset with pixel-level annotations for 19 facial component classes.

Property Value
Images 30,000
Classes 19 facial components
Resolution High quality
Used by BiSeNet (ResNet18, ResNet34)

Source

GitHub: switchablenorms/CelebAMask-HQ

**Training code**: [yakhyo/face-parsing](https://github.com/yakhyo/face-parsing)

Dataset structure:

dataset/
├── images/           # Input face images
│   ├── image1.jpg
│   └── ...
└── labels/           # Segmentation masks
    ├── image1.png
    └── ...

Attribute Analysis

CelebA

Large-scale face attributes dataset widely used for training age and gender prediction models.

Property Value
Images ~200K
Attributes 40 binary attributes
Used by AgeGender

Reference

Paper: Deep Learning Face Attributes in the Wild


FairFace

Face attribute dataset designed for balanced representation across race, gender, and age groups. Provides more equitable predictions compared to imbalanced datasets.

Property Value
Attributes Race (7), Gender (2), Age Group (9)
Used by FairFace
License CC BY 4.0

Reference

Paper: FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age

**ONNX inference**: [yakhyo/fairface-onnx](https://github.com/yakhyo/fairface-onnx)

AffectNet

Large-scale facial expression dataset for emotion recognition training.

Property Value
Classes 7 or 8 (Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry + Contempt)
Used by Emotion (AFFECNET7, AFFECNET8)

Reference

Paper: AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild


Evaluation Benchmarks

Face Detection

WIDER FACE Validation Set

The standard benchmark for face detection models. Results are reported across three difficulty subsets.

Subset Criteria
Easy Large, clear, unoccluded faces
Medium Moderate scale and occlusion
Hard Small, heavily occluded, or challenging faces

See Model Zoo - Detection for per-model accuracy on each subset.


Face Recognition

Recognition models are evaluated across multiple benchmarks. Aligned 112x112 validation datasets are available as a single download.

Download

Kaggle: agedb-30-calfw-cplfw-lfw-aligned-112x112

Benchmark Description Used by
LFW Labeled Faces in the Wild - standard face verification benchmark ArcFace, MobileFace, SphereFace
CALFW Cross-Age LFW - face verification across age gaps MobileFace, SphereFace
CPLFW Cross-Pose LFW - face verification across pose variations MobileFace, SphereFace
AgeDB-30 Age database with 30-year age gaps ArcFace, MobileFace, SphereFace
CFP-FP Celebrities in Frontal-Profile - frontal vs. profile verification ArcFace
IJB-B IARPA Janus Benchmark B - TAR@FAR=0.01% AdaFace
IJB-C IARPA Janus Benchmark C - TAR@FAR=1e-4 AdaFace, ArcFace

See Model Zoo - Recognition for per-model accuracy on each benchmark.


Gaze Estimation

Benchmark Metric Description
Gaze360 test set MAE (degrees) Mean Absolute Error in gaze angle prediction

See Model Zoo - Gaze for per-model MAE scores.


Training Repositories

For training your own models or reproducing results, see the following repositories:

Task Repository Datasets Supported
Detection yakhyo/retinaface-pytorch WIDER FACE
Recognition yakhyo/face-recognition MS1MV2, CASIA-WebFace, VGGFace2
Gaze yakhyo/gaze-estimation Gaze360, MPIIFaceGaze
Parsing yakhyo/face-parsing CelebAMask-HQ