Model Zoo
Complete guide to all available models and their performance characteristics.
Face Detection Models
RetinaFace Family
RetinaFace models are trained on the WIDER FACE dataset.
| Model Name | Params | Size | Easy | Medium | Hard |
|---|---|---|---|---|---|
MNET_025 |
0.4M | 1.7MB | 88.48% | 87.02% | 80.61% |
MNET_050 |
1.0M | 2.6MB | 89.42% | 87.97% | 82.40% |
MNET_V1 |
3.5M | 3.8MB | 90.59% | 89.14% | 84.13% |
MNET_V2 |
3.2M | 3.5MB | 91.70% | 91.03% | 86.60% |
RESNET18 |
11.7M | 27MB | 92.50% | 91.02% | 86.63% |
RESNET34 |
24.8M | 56MB | 94.16% | 93.12% | 88.90% |
Accuracy & Benchmarks
Accuracy: WIDER FACE validation set (Easy/Medium/Hard subsets) - from RetinaFace paper
Speed: Benchmark on your own hardware using python tools/detection.py --source <image> --iterations 100
SCRFD Family
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on WIDER FACE dataset.
| Model Name | Params | Size | Easy | Medium | Hard |
|---|---|---|---|---|---|
SCRFD_500M |
0.6M | 2.5MB | 90.57% | 88.12% | 68.51% |
SCRFD_10G |
4.2M | 17MB | 95.16% | 93.87% | 83.05% |
Accuracy & Benchmarks
Accuracy: WIDER FACE validation set - from SCRFD paper
Speed: Benchmark on your own hardware using python tools/detection.py --source <image> --iterations 100
YOLOv5-Face Family
YOLOv5-Face models provide detection with 5-point facial landmarks, trained on WIDER FACE dataset.
| Model Name | Size | Easy | Medium | Hard |
|---|---|---|---|---|
YOLOV5N |
11MB | 93.61% | 91.52% | 80.53% |
YOLOV5S |
28MB | 94.33% | 92.61% | 83.15% |
YOLOV5M |
82MB | 95.30% | 93.76% | 85.28% |
Accuracy & Benchmarks
Accuracy: WIDER FACE validation set - from YOLOv5-Face paper
Speed: Benchmark on your own hardware using python tools/detection.py --source <image> --iterations 100
Fixed Input Size
All YOLOv5-Face models use a fixed input size of 640×640.
YOLOv8-Face Family
YOLOv8-Face models use anchor-free design with DFL (Distribution Focal Loss) for bbox regression. Provides detection with 5-point facial landmarks.
| Model Name | Size | Easy | Medium | Hard |
|---|---|---|---|---|
YOLOV8_LITE_S |
7.4MB | 93.4% | 91.2% | 78.6% |
YOLOV8N |
12MB | 94.6% | 92.3% | 79.6% |
Accuracy & Benchmarks
Accuracy: WIDER FACE validation set (Easy/Medium/Hard subsets)
Speed: Benchmark on your own hardware using python tools/detection.py --source <image> --method yolov8face
Fixed Input Size
All YOLOv8-Face models use a fixed input size of 640×640.
Face Recognition Models
AdaFace
Face recognition using adaptive margin based on image quality.
| Model Name | Backbone | Dataset | Size | IJB-B TAR | IJB-C TAR |
|---|---|---|---|---|---|
IR_18 |
IR-18 | WebFace4M | 92 MB | 93.03% | 94.99% |
IR_101 |
IR-101 | WebFace12M | 249 MB | - | 97.66% |
Training Data & Accuracy
Dataset: WebFace4M (4M images) / WebFace12M (12M images)
Accuracy: IJB-B and IJB-C benchmarks, TAR@FAR=0.01%
Key Innovation
AdaFace introduces adaptive margin that adjusts based on image quality, providing better performance on low-quality images compared to fixed-margin approaches.
ArcFace
Face recognition using additive angular margin loss.
| Model Name | Backbone | Params | Size | LFW | CFP-FP | AgeDB-30 | IJB-C |
|---|---|---|---|---|---|---|---|
MNET |
MobileNet | 2.0M | 8MB | 99.70% | 98.00% | 96.58% | 95.02% |
RESNET |
ResNet50 | 43.6M | 166MB | 99.83% | 99.33% | 98.23% | 97.25% |
Training Data
Dataset: Trained on WebFace600K (600K images)
Accuracy: IJB-C accuracy reported as TAR@FAR=1e-4
MobileFace
Lightweight face recognition models with MobileNet backbones.
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
|---|---|---|---|---|---|---|---|
MNET_025 |
MobileNetV1 0.25 | 0.36M | 1MB | 98.76% | 92.02% | 82.37% | 90.02% |
MNET_V2 |
MobileNetV2 | 2.29M | 4MB | 99.55% | 94.87% | 86.89% | 95.16% |
MNET_V3_SMALL |
MobileNetV3-S | 1.25M | 3MB | 99.30% | 93.77% | 85.29% | 92.79% |
MNET_V3_LARGE |
MobileNetV3-L | 3.52M | 10MB | 99.53% | 94.56% | 86.79% | 95.13% |
Training Data
Dataset: Trained on MS1M-V2 (5.8M images, 85K identities)
Accuracy: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
SphereFace
Face recognition using angular softmax loss.
| Model Name | Backbone | Params | Size | LFW | CALFW | CPLFW | AgeDB-30 |
|---|---|---|---|---|---|---|---|
SPHERE20 |
Sphere20 | 24.5M | 50MB | 99.67% | 95.61% | 88.75% | 96.58% |
SPHERE36 |
Sphere36 | 34.6M | 92MB | 99.72% | 95.64% | 89.92% | 96.83% |
Training Data
Dataset: Trained on MS1M-V2 (5.8M images, 85K identities)
Accuracy: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks
Architecture
SphereFace uses angular softmax loss, an earlier approach before ArcFace. These models provide good accuracy with moderate resource requirements.
Facial Landmark Models
106-Point Landmark Detection
Facial landmark localization model.
| Model Name | Points | Params | Size |
|---|---|---|---|
2D106 |
106 | 3.7M | 14MB |
Landmark Groups:
| Group | Points | Count |
|---|---|---|
| Face contour | 0-32 | 33 points |
| Eyebrows | 33-50 | 18 points |
| Nose | 51-62 | 12 points |
| Eyes | 63-86 | 24 points |
| Mouth | 87-105 | 19 points |
Attribute Analysis Models
Age & Gender Detection
| Model Name | Attributes | Params | Size |
|---|---|---|---|
AgeGender |
Age, Gender | 2.1M | 8MB |
Training Data
Dataset: Trained on CelebA
Accuracy Note
Accuracy varies by demographic and image quality. Test on your specific use case.
FairFace Attributes
| Model Name | Attributes | Params | Size |
|---|---|---|---|
FairFace |
Race, Gender, Age Group | - | 44MB |
Training Data
Dataset: Trained on FairFace dataset with balanced demographics
Equitable Predictions
FairFace provides more equitable predictions across different racial and gender groups.
Race Categories (7): White, Black, Latino Hispanic, East Asian, Southeast Asian, Indian, Middle Eastern
Age Groups (9): 0-2, 3-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70+
Emotion Detection
| Model Name | Classes | Params | Size |
|---|---|---|---|
AFFECNET7 |
7 | 0.5M | 2MB |
AFFECNET8 |
8 | 0.5M | 2MB |
Classes (7): Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger
Classes (8): Above + Contempt
Training Data
Dataset: Trained on AffectNet
Accuracy Note
Emotion detection accuracy depends heavily on facial expression clarity and cultural context.
Gaze Estimation Models
MobileGaze Family
Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.
| Model Name | Params | Size | MAE* |
|---|---|---|---|
RESNET18 |
11.7M | 43 MB | 12.84 |
RESNET34 |
24.8M | 81.6 MB | 11.33 |
RESNET50 |
25.6M | 91.3 MB | 11.34 |
MOBILENET_V2 |
3.5M | 9.59 MB | 13.07 |
MOBILEONE_S0 |
2.1M | 4.8 MB | 12.58 |
*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better
Training Data
Dataset: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)
Training: 200 epochs with classification-based approach (binned angles)
Input Requirements
Requires face crop as input. Use face detection first to obtain bounding boxes.
Face Parsing Models
BiSeNet Family
BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.
| Model Name | Params | Size | Classes |
|---|---|---|---|
RESNET18 |
13.3M | 50.7 MB | 19 |
RESNET34 |
24.1M | 89.2 MB | 19 |
Training Data
Dataset: Trained on CelebAMask-HQ
Architecture: BiSeNet with ResNet backbone
Input Size: 512×512 (automatically resized)
19 Facial Component Classes:
| # | Class | # | Class | # | Class |
|---|---|---|---|---|---|
| 1 | Background | 8 | Left Ear | 15 | Neck |
| 2 | Skin | 9 | Right Ear | 16 | Neck Lace |
| 3 | Left Eyebrow | 10 | Ear Ring | 17 | Cloth |
| 4 | Right Eyebrow | 11 | Nose | 18 | Hair |
| 5 | Left Eye | 12 | Mouth | 19 | Hat |
| 6 | Right Eye | 13 | Upper Lip | ||
| 7 | Eye Glasses | 14 | Lower Lip |
Applications:
- Face makeup and beauty applications
- Virtual try-on systems
- Face editing and manipulation
- Facial feature extraction
- Portrait segmentation
Input Requirements
Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.
XSeg
XSeg from DeepFaceLab outputs masks for face regions. Requires 5-point landmarks for face alignment.
| Model Name | Size | Output |
|---|---|---|
DEFAULT |
67 MB | Mask [0, 1] |
Model Details
Origin: DeepFaceLab
Input: NHWC format, normalized to [0, 1]
Alignment: Requires 5-point landmarks (not bbox crops)
Applications:
- Face region extraction
- Face swapping pipelines
- Occlusion handling
Input Requirements
Requires 5-point facial landmarks. Use a face detector like RetinaFace to obtain landmarks first.
Anti-Spoofing Models
MiniFASNet Family
Face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).
| Model Name | Size | Scale |
|---|---|---|
V1SE |
1.2 MB | 4.0 |
V2 |
1.2 MB | 2.7 |
Output Format
Output: Returns SpoofingResult(is_real, confidence) where is_real: True=Real, False=Fake
Input Requirements
Requires face bounding box from a detector.
Model Management
Models are automatically downloaded and cached on first use.
- Cache location:
~/.uniface/models/ - Verification: Models are verified with SHA-256 checksums
- Manual download: Use
python tools/download_model.pyto pre-download models
References
Model Training & Architectures
- RetinaFace Training: yakhyo/retinaface-pytorch - PyTorch implementation and training code
- YOLOv5-Face Original: deepcam-cn/yolov5-face - Original PyTorch implementation
- YOLOv5-Face ONNX: yakhyo/yolov5-face-onnx-inference - ONNX inference implementation
- YOLOv8-Face Original: derronqi/yolov8-face - Original PyTorch implementation
- YOLOv8-Face ONNX: yakhyo/yolov8-face-onnx-inference - ONNX inference implementation
- AdaFace Original: mk-minchul/AdaFace - Original PyTorch implementation
- AdaFace ONNX: yakhyo/adaface-onnx - ONNX export and inference
- Face Recognition Training: yakhyo/face-recognition - ArcFace, MobileFace, SphereFace training code
- Gaze Estimation Training: yakhyo/gaze-estimation - MobileGaze training code and pretrained weights
- Face Parsing Training: yakhyo/face-parsing - BiSeNet training code and pretrained weights
- Face Segmentation: yakhyo/face-segmentation - XSeg ONNX Inference
- Face Anti-Spoofing: yakhyo/face-anti-spoofing - MiniFASNet ONNX inference (weights from minivision-ai/Silent-Face-Anti-Spoofing)
- FairFace: yakhyo/fairface-onnx - FairFace ONNX inference for race, gender, age prediction
- InsightFace: deepinsight/insightface - Model architectures and pretrained weights
Papers
- RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild
- SCRFD: Sample and Computation Redistribution for Efficient Face Detection
- YOLOv5-Face: YOLO5Face: Why Reinventing a Face Detector
- AdaFace: AdaFace: Quality Adaptive Margin for Face Recognition
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition
- SphereFace: Deep Hypersphere Embedding for Face Recognition
- BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation