Tracking
Multi-object tracking using BYTETracker with Kalman filtering and IoU-based association. The tracker assigns persistent IDs to detected objects across video frames using a two-stage association strategy — first matching high-confidence detections, then low-confidence ones.
How It Works
BYTETracker takes detection bounding boxes as input and returns tracked bounding boxes with persistent IDs. It does not depend on any specific detector — any source of [x1, y1, x2, y2, score] arrays will work.
Each frame, the tracker:
- Splits detections into high-confidence and low-confidence groups
- Matches high-confidence detections to existing tracks using IoU
- Matches remaining tracks to low-confidence detections (second chance)
- Starts new tracks for unmatched high-confidence detections
- Removes tracks that have been lost for too long
The Kalman filter predicts where each track will be in the next frame, which helps maintain associations even when detections are noisy.
Basic Usage
import cv2
import numpy as np
from uniface.common import xyxy_to_cxcywh
from uniface.detection import SCRFD
from uniface.tracking import BYTETracker
from uniface.draw import draw_tracks
detector = SCRFD()
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
cap = cv2.VideoCapture("video.mp4")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# 1. Detect faces
faces = detector.detect(frame)
# 2. Build detections array: [x1, y1, x2, y2, score]
dets = np.array([[*f.bbox, f.confidence] for f in faces])
dets = dets if len(dets) > 0 else np.empty((0, 5))
# 3. Update tracker
tracks = tracker.update(dets)
# 4. Map track IDs back to face objects
if len(tracks) > 0 and len(faces) > 0:
face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
track_ids = tracks[:, 4].astype(int)
face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
for ti in range(len(tracks)):
dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
faces[int(np.argmin(dists))].track_id = track_ids[ti]
# 5. Draw
tracked_faces = [f for f in faces if f.track_id is not None]
draw_tracks(image=frame, faces=tracked_faces)
cv2.imshow("Tracking", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Each track ID gets a deterministic color via golden-ratio hue stepping, so the same person keeps the same color across the entire video.
Webcam Tracking
import cv2
import numpy as np
from uniface.common import xyxy_to_cxcywh
from uniface.detection import SCRFD
from uniface.tracking import BYTETracker
from uniface.draw import draw_tracks
detector = SCRFD()
tracker = BYTETracker(track_thresh=0.5, track_buffer=30)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
faces = detector.detect(frame)
dets = np.array([[*f.bbox, f.confidence] for f in faces])
dets = dets if len(dets) > 0 else np.empty((0, 5))
tracks = tracker.update(dets)
if len(tracks) > 0 and len(faces) > 0:
face_bboxes = np.array([f.bbox for f in faces], dtype=np.float32)
track_ids = tracks[:, 4].astype(int)
face_centers = xyxy_to_cxcywh(face_bboxes)[:, :2]
track_centers = xyxy_to_cxcywh(tracks[:, :4])[:, :2]
for ti in range(len(tracks)):
dists = (track_centers[ti, 0] - face_centers[:, 0]) ** 2 + (track_centers[ti, 1] - face_centers[:, 1]) ** 2
faces[int(np.argmin(dists))].track_id = track_ids[ti]
draw_tracks(image=frame, faces=[f for f in faces if f.track_id is not None])
cv2.imshow("Face Tracking - Press 'q' to quit", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Parameters
from uniface.tracking import BYTETracker
tracker = BYTETracker(
track_thresh=0.5,
track_buffer=30,
match_thresh=0.8,
low_thresh=0.1,
)
| Parameter | Default | Description |
|---|---|---|
track_thresh |
0.5 | Detections above this score go through first-pass association |
track_buffer |
30 | How many frames to keep a lost track before removing it |
match_thresh |
0.8 | IoU threshold for matching tracks to detections |
low_thresh |
0.1 | Detections below this score are discarded entirely |
Input / Output
Input — (N, 5) numpy array with [x1, y1, x2, y2, confidence] per detection:
Output — (M, 5) numpy array with [x1, y1, x2, y2, track_id] per active track:
tracks = tracker.update(detections)
# array([[101.2, 51.3, 199.8, 159.8, 1.],
# [300.5, 80.2, 379.7, 200.1, 2.]])
The output bounding boxes come from the Kalman filter prediction, so they may differ slightly from the input. Track IDs are integers that persist across frames for the same object.
Resetting the Tracker
When switching to a different video or scene, reset the tracker to clear all internal state:
This clears all active, lost, and removed tracks, resets the frame counter, and resets the ID counter back to zero.
Visualization
draw_tracks draws bounding boxes color-coded by track ID:
from uniface.draw import draw_tracks
draw_tracks(
image=frame,
faces=tracked_faces,
draw_landmarks=True,
draw_id=True,
corner_bbox=True,
)
Small Face Performance
Tracking performance with small faces
The tracker relies on IoU (Intersection over Union) to match detections across frames. When faces occupy a small portion of the image — for example in surveillance footage or wide-angle cameras — even slight movement between frames can cause a large drop in IoU. This makes it harder for the tracker to maintain consistent IDs, and you may see IDs switching or resetting more often than expected.
This is not specific to BYTETracker; it applies to any IoU-based tracker. A few things that can help:
- Lower
match_thresh(e.g.0.5or0.6) so the tracker accepts lower overlap as a valid match. - Increase
track_buffer(e.g.60or higher) to hold onto lost tracks longer before discarding them. - Use a higher-resolution input if possible, so face bounding boxes are larger in pixel terms.
CLI Tool
# Track faces in a video
python tools/track.py --source video.mp4
# Webcam
python tools/track.py --source 0
# Save output
python tools/track.py --source video.mp4 --output tracked.mp4
# Use RetinaFace instead of SCRFD
python tools/track.py --source video.mp4 --detector retinaface
# Keep lost tracks longer
python tools/track.py --source video.mp4 --track-buffer 60
References
- yakhyo/bytetrack-tracker — standalone BYTETracker implementation used in UniFace
- ByteTrack paper — Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"
See Also
- Detection — face detection models
- Video & Webcam — video processing patterns
- Inputs & Outputs — data types and formats