Large scale Real-world Multi Person Tracking Dataset

Dataset Description

PersonPath22 is multi-person tracking dataset that is over an order of magnitude larger than the most popular tracking datasets MOT17 and MOT20, while maintaining the high quality bar of annotation present in those datasets.

Video source composition: Videos are collected from sources where we are given the rights to redistribute the content and participants have given explicit consent. Our dataset consists of 236 videos captured mostly from static-mounted cameras. Approximately 80% of these videos are carefully sourced from scratch from stock footage websites and 20% are collected from existing datasets.

Video diversity: Videos in the dataset has large diversity in terms 1), camera angles (from birds-eye view to low-angle view); 2) scenery/ weather conditions (sunny, raining, cloudy, night); 3), lighting conditions as well as 4), crowd densities.

Person-level meta data: Videos in PersonPath22 dataset are exhaustively annotated with both amodal and visible bounding boxes as well as their unique identifiers. In addition, each person is further labeled with the following tags:



We provide scripts to automatically download all the source videos and their corresponding tracking annotations, which include amodal and visible bounding boxes for each person. In addition, each person is annotated with a unique person identifier throughout each video, and we assume that the same person does not appear in different videos. The scripts are included in our official GitHub repo.

Evaluation Code

To setup a consistent protocol for all researchers, evaluation scripts are available in the open-sourced TrackEval library.

Public detection

In cases that the researchers would like to compare their data association model alone, we provide standard public detection results (from Fully Convolutional One-stage Object Detection -- FCOS) for all videos so that their comparison is consistent.



This paper presents a new large scale multi-person tracking dataset. Our dataset is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems on a wide range of scenarios and conditions such as variations in person density, actions being performed, weather, and time of day. Our dataset was specifically sourced to provide a wide variety of these conditions and our annotations include rich meta-data such that the performance of a tracker can be evaluated along these different dimensions. The lack of training data has also limited the ability to perform end-to-end training of tracking systems. As such, the highest performing tracking systems all rely on strong detectors trained on external image datasets. We hope that the release of this dataset will enable new lines of research that take advantage of large scale video based training data.


If you use this dataset for publication, we kindly ask you to attribute credit by citing our ECCV 2022 paper, providing detailed description of the dataset and benchmarks:

  title={Large scale Real-world Multi Person Tracking},
  author={Shuai, Bing and Bergamo, Alessandro and Buechler, Uta and Berneshawi, Andrew and Boden, Alyssa and Tighe, Joe},
  booktitle={European Conference on Computer Vision},

Credit for the annotations must be given to:

Credit for building the dataset must be given to

Bing Shuai

  Alessandro Bergamo

Uta Buechler

Andrew Berneshawi

Alyssa Boden

Joseph Tighe


This dataset is distributed under Creative Commons Attribution-NonCommercial 4.0 International Public License (CC BY-NC 4.0) . We ask the users of this dataset to use the data in a socially responsible manner, and request to not use the data to identify or generate biometric information of the people in the videos.

© 2022.