RetinaFace: Single-Stage Face Detection with MobileNet and ResNet Backbones
RetinaFace is a single-stage face detector that predicts face bounding boxes and 5-point landmarks. This implementation adds multiple backbones, WIDER FACE evaluation, webcam inference, PyTorch weights, and ONNX weights. See the project on github.com/yakhyo/retinaface-pytorch.

Backbones
The repository supports lightweight MobileNet models and heavier ResNet models.
| Backbone | Notes |
|---|---|
| MobileNetV1 0.25 | smallest MobileNetV1 width multiplier |
| MobileNetV1 0.50 | wider MobileNetV1 variant |
| MobileNetV1 | standard lightweight backbone |
| MobileNetV2 | stronger mobile backbone |
| ResNet18 | moderate ResNet option |
| ResNet34 | strongest reported model in the available tables |
| ResNet50 | listed as supported, but release weights are not available in the README table |
The MobileNet models are intended for smaller runtime budgets. ResNet models are larger but usually more accurate.
WIDER FACE Results
Multi-scale Image Resizing
| Backbone | Easy | Medium | Hard |
|---|---|---|---|
| MobileNetV1 0.25 | 88.48% | 87.02% | 80.61% |
| MobileNetV1 0.50 | 89.42% | 87.97% | 82.40% |
| MobileNetV1 | 90.59% | 89.14% | 84.13% |
| MobileNetV2 | 91.70% | 91.03% | 86.60% |
| ResNet18 | 92.50% | 91.02% | 86.63% |
| ResNet34 | 94.16% | 93.12% | 88.90% |
Original Image Size
| Backbone | Easy | Medium | Hard |
|---|---|---|---|
| MobileNetV1 0.25 | 90.70% | 88.12% | 73.82% |
| MobileNetV1 0.50 | 91.56% | 89.46% | 76.56% |
| MobileNetV1 | 92.19% | 90.41% | 79.56% |
| MobileNetV2 | 94.04% | 92.26% | 83.59% |
| ResNet18 | 94.28% | 92.69% | 82.95% |
| ResNet34 | 95.07% | 93.48% | 84.40% |
Small-Face Filtering
The README includes an additional set of WIDER FACE results after filtering faces smaller than 16 pixels during training.
The change improves the easy and medium splits in several cases, because very small noisy annotations create fewer false positives. The tradeoff is visible on the hard split: performance drops sharply when the evaluation depends on very small faces.
That makes the choice task-dependent. If the deployment mostly sees normal-sized faces, filtering can be useful. If the task is crowd scenes or surveillance-style images, the hard-split drop matters.
Large Selfie Result
The repository also includes a large selfie example using MobileNetV2:

The README notes that the MobileNetV2 model finds 632 faces in this image.
PyTorch and ONNX
The repository provides PyTorch and ONNX weights for the published MobileNet and ResNet variants. It also includes training, WIDER FACE evaluation, image inference, video/webcam inference, and ONNX export code.
For application code that only needs detection as part of a larger face-analysis pipeline, this model family is also available through UniFace.