Model Zoo

Complete guide to all available models and their performance characteristics.

Face Detection Models

RetinaFace Family

RetinaFace models are trained on the WIDER FACE dataset.

Model Name	Params	Size	Easy	Medium	Hard
`MNET_025`	0.4M	1.7MB	88.48%	87.02%	80.61%
`MNET_050`	1.0M	2.6MB	89.42%	87.97%	82.40%
`MNET_V1`	3.5M	3.8MB	90.59%	89.14%	84.13%
`MNET_V2`	3.2M	3.5MB	91.70%	91.03%	86.60%
`RESNET18`	11.7M	27MB	92.50%	91.02%	86.63%
`RESNET34`	24.8M	56MB	94.16%	93.12%	88.90%

Accuracy & Benchmarks

Accuracy: WIDER FACE validation set (Easy/Medium/Hard subsets) - from RetinaFace paper

Speed: Benchmark on your own hardware using python tools/detect.py --source <image>

SCRFD Family

SCRFD (Sample and Computation Redistribution for Efficient Face Detection) models trained on WIDER FACE dataset.

Model Name	Params	Size	Easy	Medium	Hard
`SCRFD_500M_KPS`	0.6M	2.5MB	90.57%	88.12%	68.51%
`SCRFD_10G_KPS`	4.2M	17MB	95.16%	93.87%	83.05%

Accuracy & Benchmarks

Accuracy: WIDER FACE validation set - from SCRFD paper

Speed: Benchmark on your own hardware using python tools/detect.py --source <image>

YOLOv5-Face Family

YOLOv5-Face models provide detection with 5-point facial landmarks, trained on WIDER FACE dataset.

Model Name	Size	Easy	Medium	Hard
`YOLOV5N`	11MB	93.61%	91.52%	80.53%
`YOLOV5S`	28MB	94.33%	92.61%	83.15%
`YOLOV5M`	82MB	95.30%	93.76%	85.28%

Accuracy & Benchmarks

Accuracy: WIDER FACE validation set - from YOLOv5-Face paper

Speed: Benchmark on your own hardware using python tools/detect.py --source <image>

Fixed Input Size

All YOLOv5-Face models use a fixed input size of 640×640.

YOLOv8-Face Family

YOLOv8-Face models use anchor-free design with DFL (Distribution Focal Loss) for bbox regression. Provides detection with 5-point facial landmarks.

Model Name	Size	Easy	Medium	Hard
`YOLOV8_LITE_S`	7.4MB	93.4%	91.2%	78.6%
`YOLOV8N`	12MB	94.6%	92.3%	79.6%

Accuracy & Benchmarks

Accuracy: WIDER FACE validation set (Easy/Medium/Hard subsets)

Speed: Benchmark on your own hardware using python tools/detect.py --source <image> --method yolov8face

Fixed Input Size

All YOLOv8-Face models use a fixed input size of 640×640.

Face Recognition Models

AdaFace

Face recognition using adaptive margin based on image quality.

Model Name	Backbone	Dataset	Size	IJB-B TAR	IJB-C TAR
`IR_18`	IR-18	WebFace4M	92 MB	93.03%	94.99%
`IR_101`	IR-101	WebFace12M	249 MB	-	97.66%

Training Data & Accuracy

Dataset: WebFace4M / WebFace12M (4M / 12M images)

Accuracy: IJB-B and IJB-C benchmarks, TAR@FAR=0.01%

Key Innovation

AdaFace introduces adaptive margin that adjusts based on image quality, providing better performance on low-quality images compared to fixed-margin approaches.

ArcFace

Face recognition using additive angular margin loss.

Model Name	Backbone	Params	Size	LFW	CFP-FP	AgeDB-30	IJB-C
`MNET`	MobileNet	2.0M	8MB	99.70%	98.00%	96.58%	95.02%
`RESNET`	ResNet50	43.6M	166MB	99.83%	99.33%	98.23%	97.25%

Training Data

Dataset: Trained on WebFace600K (600K images)

Accuracy: IJB-C accuracy reported as TAR@FAR=1e-4

MobileFace

Lightweight face recognition models with MobileNet backbones.

Model Name	Backbone	Params	Size	LFW	CALFW	CPLFW	AgeDB-30
`MNET_025`	MobileNetV1 0.25	0.36M	1MB	98.76%	92.02%	82.37%	90.02%
`MNET_V2`	MobileNetV2	2.29M	4MB	99.55%	94.87%	86.89%	95.16%
`MNET_V3_SMALL`	MobileNetV3-S	1.25M	3MB	99.30%	93.77%	85.29%	92.79%
`MNET_V3_LARGE`	MobileNetV3-L	3.52M	10MB	99.53%	94.56%	86.79%	95.13%

Training Data

Dataset: Trained on MS1MV2 (5.8M images, 85K identities)

Accuracy: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks

SphereFace

Face recognition using angular softmax loss.

Model Name	Backbone	Params	Size	LFW	CALFW	CPLFW	AgeDB-30
`SPHERE20`	Sphere20	24.5M	50MB	99.67%	95.61%	88.75%	96.58%
`SPHERE36`	Sphere36	34.6M	92MB	99.72%	95.64%	89.92%	96.83%

Training Data

Dataset: Trained on MS1MV2 (5.8M images, 85K identities)

Accuracy: Evaluated on LFW, CALFW, CPLFW, and AgeDB-30 benchmarks

Architecture

SphereFace uses angular softmax loss, an earlier approach before ArcFace. These models provide good accuracy with moderate resource requirements.

Facial Landmark Models

106-Point Landmark Detection

Facial landmark localization model.

Model Name	Points	Params	Size
`2D106`	106	3.7M	14MB

Landmark Groups:

Group	Points	Count
Face contour	0-32	33 points
Eyebrows	33-50	18 points
Nose	51-62	12 points
Eyes	63-86	24 points
Mouth	87-105	19 points

Attribute Analysis Models

Age & Gender Detection

Model Name	Attributes	Params	Size
`AgeGender`	Age, Gender	2.1M	8MB

Training Data

Dataset: Trained on CelebA

Accuracy Note

Accuracy varies by demographic and image quality. Test on your specific use case.

FairFace Attributes

Model Name	Attributes	Params	Size
`FairFace`	Race, Gender, Age Group	-	44MB

Training Data

Dataset: Trained on FairFace dataset with balanced demographics

Equitable Predictions

FairFace provides more equitable predictions across different racial and gender groups.

Race Categories (7): White, Black, Latino Hispanic, East Asian, Southeast Asian, Indian, Middle Eastern

Age Groups (9): 0-2, 3-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70+

Emotion Detection

Model Name	Classes	Params	Size
`AFFECNET7`	7	0.5M	2MB
`AFFECNET8`	8	0.5M	2MB

Classes (7): Neutral, Happy, Sad, Surprise, Fear, Disgust, Angry

Classes (8): Above + Contempt

Training Data

Dataset: Trained on AffectNet

Accuracy Note

Emotion detection accuracy depends heavily on facial expression clarity and cultural context.

Gaze Estimation Models

MobileGaze Family

Gaze direction prediction models trained on Gaze360 dataset. Returns pitch (vertical) and yaw (horizontal) angles in radians.

Model Name	Params	Size	MAE*
`RESNET18`	11.7M	43 MB	12.84
`RESNET34`	24.8M	81.6 MB	11.33
`RESNET50`	25.6M	91.3 MB	11.34
`MOBILENET_V2`	3.5M	9.59 MB	13.07
`MOBILEONE_S0`	2.1M	4.8 MB	12.58

*MAE (Mean Absolute Error) in degrees on Gaze360 test set - lower is better

Training Data

Dataset: Trained on Gaze360 (indoor/outdoor scenes with diverse head poses)

Training: 200 epochs with classification-based approach (binned angles)

Input Requirements

Requires face crop as input. Use face detection first to obtain bounding boxes.

Face Parsing Models

BiSeNet Family

BiSeNet (Bilateral Segmentation Network) models for semantic face parsing. Segments face images into 19 facial component classes.

Model Name	Params	Size	Classes
`RESNET18`	13.3M	50.7 MB	19
`RESNET34`	24.1M	89.2 MB	19

Training Data

Dataset: Trained on CelebAMask-HQ

Architecture: BiSeNet with ResNet backbone

Input Size: 512×512 (automatically resized)

19 Facial Component Classes:

#	Class	#	Class	#	Class
0	Background	7	Left Ear	14	Neck
1	Skin	8	Right Ear	15	Neck Lace
2	Left Eyebrow	9	Ear Ring	16	Cloth
3	Right Eyebrow	10	Nose	17	Hair
4	Left Eye	11	Mouth	18	Hat
5	Right Eye	12	Upper Lip
6	Eye Glasses	13	Lower Lip

Applications:

Face makeup and beauty applications
Virtual try-on systems
Face editing and manipulation
Facial feature extraction
Portrait segmentation

Input Requirements

Input should be a cropped face image. For full pipeline, use face detection first to obtain face crops.

XSeg

XSeg from DeepFaceLab outputs masks for face regions. Requires 5-point landmarks for face alignment.

Model Name	Size	Output
`DEFAULT`	67 MB	Mask [0, 1]

Model Details

Origin: DeepFaceLab

Input: NHWC format, normalized to [0, 1]

Alignment: Requires 5-point landmarks (not bbox crops)

Applications:

Face region extraction
Face swapping pipelines
Occlusion handling

Input Requirements

Requires 5-point facial landmarks. Use a face detector like RetinaFace to obtain landmarks first.

Anti-Spoofing Models

MiniFASNet Family

Face anti-spoofing models for liveness detection. Detect if a face is real (live) or fake (photo, video replay, mask).

Model Name	Size	Scale
`V1SE`	1.2 MB	4.0
`V2`	1.2 MB	2.7

Output Format

Output: Returns SpoofingResult(is_real, confidence) where is_real: True=Real, False=Fake

Input Requirements

Requires face bounding box from a detector.

Model Management

Models are automatically downloaded and cached on first use.

Cache location: ~/.uniface/models/ (configurable via set_cache_dir() or UNIFACE_CACHE_DIR env var)
Inspect cache path: get_cache_dir() returns the resolved active path
Verification: Models are verified with SHA-256 checksums
Concurrent download: download_models([...]) fetches multiple models in parallel
Manual download: Use python tools/download_model.py to pre-download models

See Model Cache & Offline Use for full details.

References

Model Training & Architectures

RetinaFace Training: yakhyo/retinaface-pytorch - PyTorch implementation and training code
YOLOv5-Face Original: deepcam-cn/yolov5-face - Original PyTorch implementation
YOLOv5-Face ONNX: yakhyo/yolov5-face-onnx-inference - ONNX inference implementation
YOLOv8-Face Original: derronqi/yolov8-face - Original PyTorch implementation
YOLOv8-Face ONNX: yakhyo/yolov8-face-onnx-inference - ONNX inference implementation
AdaFace Original: mk-minchul/AdaFace - Original PyTorch implementation
AdaFace ONNX: yakhyo/adaface-onnx - ONNX export and inference
Face Recognition Training: yakhyo/face-recognition - ArcFace, MobileFace, SphereFace training code
Gaze Estimation Training: yakhyo/gaze-estimation - MobileGaze training code and pretrained weights
Face Parsing Training: yakhyo/face-parsing - BiSeNet training code and pretrained weights
Face Segmentation: yakhyo/face-segmentation - XSeg ONNX Inference
Face Anti-Spoofing: yakhyo/face-anti-spoofing - MiniFASNet ONNX inference (weights from minivision-ai/Silent-Face-Anti-Spoofing)
FairFace: yakhyo/fairface-onnx - FairFace ONNX inference for race, gender, age prediction
InsightFace: deepinsight/insightface - Model architectures and pretrained weights

Papers

RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild
SCRFD: Sample and Computation Redistribution for Efficient Face Detection
YOLOv5-Face: YOLO5Face: Why Reinventing a Face Detector
AdaFace: AdaFace: Quality Adaptive Margin for Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
SphereFace: Deep Hypersphere Embedding for Face Recognition
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation