gaze-estimation

MobileGaze: Pre-trained mobile nets for Gaze-Estimation

Downloads

Video by Yan Krukau: https://www.pexels.com/video/male-teacher-with-his-students-8617126/

This project aims to perform gaze estimation using several deep learning models like ResNet, MobileNet v2, and MobileOne. It supports both classification and regression for predicting gaze direction. Built on top of L2CS-Net, the project includes additional pre-trained models and refined code for better performance and flexibility.

Features

ONNX Inference: Export pytorch weights to ONNX and ONNX runtime inference.
ResNet: Deep Residual Networks - Enables deeper networks with better accuracy through residual learning.
MobileNet v2: Inverted Residuals and Linear Bottlenecks - Efficient model for mobile applications, balancing performance and computational cost.
MobileOne (s0-s4): An Improved One millisecond Mobile Backbone - Achieves near-instant inference times, ideal for real-time mobile applications.
Face Detection: uniface - Uniface face detection library uses RetinaFace model.

[!NOTE]
All models are trained only on Gaze360 dataset.

Installation

Clone the repository:

git clone https://github.com/yakyo/gaze-estimation.git
cd gaze-estimation

Install the required dependencies:

pip install -r requirements.txt

Download weight files:

a) Download weights from the following links:

Model	PyTorch Weights	ONNX Weights	Size	Epochs	MAE
ResNet-18	resnet18.pt	resnet18_gaze.onnx	43 MB	200	12.84
ResNet-34	resnet34.pt	resnet34_gaze.onnx	81.6 MB	200	11.33
ResNet-50	resnet50.pt	resnet50_gaze.onnx	91.3 MB	200	11.34
MobileNet V2	mobilenetv2.pt	mobilenetv2_gaze.onnx	9.59 MB	200	13.07
MobileOne S0	mobileone_s0_fused.pt	mobileone_s0_gaze.onnx	4.8 MB	200	12.58
MobileOne S1	not available	not available	xx MB	200	*
MobileOne S2	not available	not available	xx MB	200	*
MobileOne S3	not available	not available	xx MB	200	*
MobileOne S4	not availablet	not available	xx MB	200	*

’*’ - soon will be uploaded (due to limited computing resources I cannot publish rest of the weights, but you still can train them with given code).

b) Run the command below to download weights to the weights directory (Linux):

sh download.sh [model_name]
               resnet18
               resnet34
               resnet50
               mobilenetv2
               mobileone_s0
               mobileone_s1
               mobileone_s2
               mobileone_s3
               mobileone_s4

Usage

Datasets

Dataset folder structure:

data/
├── Gaze360/
│   ├── Image/
│   └── Label/
└── MPIIFaceGaze/
    ├── Image/
    └── Label/

Gaze360

Link to download dataset: https://gaze360.csail.mit.edu/download.php
Data pre-processing code: https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#gaze360

MPIIGaze

Link to download dataset: download page
Data pre-processing code: https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#mpiifacegaze

Training

python main.py --data [dataset_path] --dataset [dataset_name] --arch [architecture_name]

main.py arguments:

usage: main.py [-h] [--data DATA] [--dataset DATASET] [--output OUTPUT] [--checkpoint CHECKPOINT] [--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE] [--arch ARCH] [--alpha ALPHA] [--lr LR] [--num-workers NUM_WORKERS]

Gaze estimation training.

options:
  -h, --help            show this help message and exit
  --data DATA           Directory path for gaze images.
  --dataset DATASET     Dataset name, available `gaze360`, `mpiigaze`.
  --output OUTPUT       Path of output models.
  --checkpoint CHECKPOINT
                        Path to checkpoint for resuming training.
  --num-epochs NUM_EPOCHS
                        Maximum number of training epochs.
  --batch-size BATCH_SIZE
                        Batch size.
  --arch ARCH           Network architecture, currently available: resnet18/34/50, mobilenetv2, mobileone_s0-s4.
  --alpha ALPHA         Regression loss coefficient.
  --lr LR               Base learning rate.
  --num-workers NUM_WORKERS
                        Number of workers for data loading.

Evaluation

python evaluate.py --data [dataset_path] --dataset [dataset_name] --weight [weight_path] --arch [architecture_name]

evaluate.py arguments:

usage: evaluate.py [-h] [--data DATA] [--dataset DATASET] [--weights WEIGHTS] [--batch-size BATCH_SIZE] [--arch ARCH] [--num-workers NUM_WORKERS]

Gaze estimation evaluation.

options:
  -h, --help            show this help message and exit
  --data DATA           Directory path for gaze images.
  --dataset DATASET     Dataset name, available `gaze360`, `mpiigaze`
  --weights WEIGHTS     Path to model weight for evaluation.
  --batch-size BATCH_SIZE
                        Batch size.
  --arch ARCH           Network architecture, currently available: resnet18/34/50, mobilenetv2, mobileone_s0-s4.
  --num-workers NUM_WORKERS
                        Number of workers for data loading.

Inference

inference.py --model [model_name] --weight [model_weight_path] --view --source [source_video / cam_index] --output [output_file] --dataset [dataset_name]

detect.py arguments:

usage: inference.py [-h] [--model MODEL] [--weight WEIGHT] [--view] [--source SOURCE] [--output OUTPUT] [--dataset DATASET]

Gaze estimation inference

options:
  -h, --help         show this help message and exit
  --model MODEL      Model name, default `resnet18`
  --weight WEIGHT    Path to gaze esimation model weights
  --view             Display the inference results
  --source SOURCE    Path to source video file or camera index
  --output OUTPUT    Path to save output file
  --dataset DATASET  Dataset name to get dataset related configs

ONNX Export and Inference

Export to ONNX

python onnx_export.py --weight [model_path] --model [model_name] --dynamic

onnx_export.py arguments:

usage: onnx_export.py [-h] [-w WEIGHT] [-n {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}] [-d {gaze360}] [--dynamic]

Gaze Estimation Model ONNX Export

options:
  -h, --help            show this help message and exit
  -w WEIGHT, --weight WEIGHT
                        Trained state_dict file path to open
  -n {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}, --model {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}
                        Backbone network architecture to use
  -d {gaze360,mpiigaze}, --dataset {gaze360,mpiigaze}
                        Dataset name for bin configuration
  --dynamic             Enable dynamic batch size and input dimensions for ONNX export

ONNX Inference

python onnx_inference.py --source [source video / webcam index] --model [onnx model path] --output [path to save video]

onnx_inference.py arguments:

usage: onnx_inference.py [-h] --source SOURCE --model MODEL [--output OUTPUT]

Gaze Estimation ONNX Inference

options:
  -h, --help       show this help message and exit
  --source SOURCE  Video path or camera index (e.g., 0 for webcam)
  --model MODEL    Path to ONNX model
  --output OUTPUT  Path to save output video (optional)

Citation

If you use this work in your research, please cite it as:

Valikhujaev, Y. (2024). MobileGaze: Pre-trained mobile nets for Gaze-Estimation. Zenodo. https://doi.org/10.5281/zenodo.14257640

Alternatively, in BibTeX format:

@misc{valikhujaev2024mobilegaze,
  author       = {Valikhujaev, Y.},
  title        = {MobileGaze: Pre-trained mobile nets for Gaze-Estimation},
  year         = {2024},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.14257640},
  url          = {https://doi.org/10.5281/zenodo.14257640}
}

Reference

This project is built on top of L2CS-Net. Most of the code parts have been re-written for reproducibility and adaptability. Several additional backbones are provided with pre-trained weights.
https://github.com/apple/ml-mobileone
uniface - face detection library used for inference in detect.py.

This site is open source. Improve this page.