The 300 Videos in the Wild (300-VW) Facial Landmark Tracking in-the-Wild Challenge & Workshop to be held in conjunction with International Conference on Computer Vision (ICCV) 2015, Santiago, Chile.
Stefanos Zafeiriou, Imperial College London, UK firstname.lastname@example.org
Georgios Tzimiropoulos, University of Nottingham, UK email@example.com
Maja Pantic, Imperial College London, UK firstname.lastname@example.org
Jie Shen, Imperial College London, UK email@example.com
Grigorios Chrysos, Imperial College London, UK firstname.lastname@example.org
Jean Kossaifi, Imperial College London, UK email@example.com
Even though comprehensive benchmarks exist for facial landmark localization in static images, very limited effort has been made towards benchmarking facial landmark tracking in videos. In ICCV 2015, we make a significant step further and present a new comprehensive benchmark, as well as organize the first workshop-challenge for landmark tracking/detection of a set of 68 fiducial points in long-term 'in-the-wild' facial videos (duration of each video ~ 1 min). The challenge will represent the very first thorough quantitative evaluation on the topic. Furthermore, the competition will explore how far we are from attaining satisfactory facial landmark tracking results in various scenarios. The results of the Challenge will be presented at the 300 Videos in the Wild (300-VW) Workshop to be held in conjunction with ICCV 2015.
In order to develop a comprehensive benchmark for evaluating facial landmark tracking
algorithms in the wild, we have collected a large number of long facial videos recorded
in the wild. Each video has duration of ~1 minute (at 25-30 fps). All frames have been annotated
with regards to the same mark-up (i.e. set of facial landmarks) used in the 300 W competition
as well [1,2] (a total of 68 landmarks, please see Fig. 1).
The training videos and annotations are available to download from here. Participants will be able to train their facial landmark tracking algorithms using the above training set and the data from 300W competition.
The training data folder is structured as follows:
Participants will have their algorithms tested on the other facial videos (300-VW test set). This dataset aims at testing the ability of current systems forfitting unseen subjects, independently of variations in pose, expression, illumination, background, occlusion, and image quality.
The following three scenarios will be considered:
Scenario 1: A number of testing videos will be of people recorded in well-lit conditions displaying arbitrary expressions in various head poses (occlusions such as glasses and beards are possible but cases of occlusions by hand or another person will not be considered here). This scenario aims to evaluate algorithms that could be suitable for facial motion analysis in laboratory and naturalistic well-lit conditions.
Scenario 2: A number of testing videos will be of people recorded in unconstrained conditions (different illuminations, dark rooms, overexposed shots, etc.), displaying arbitrary expressions in various head poses but without large occlusions (occlusions such as glasses and beards are possible but cases of heavy occlusions by hand or another person will not be considered here). This scenario aims to evaluate algorithms that could be suitable for facial motion analysis in real-world human-computer interaction applications.
Scenario 3: A number of testing videos will be of people recorded in completely unconstrained conditions including the illumination conditions, occlusions, make-up, expression, head pose, etc. This scenario aims to assess the performance of facial landmark tracking in arbitrary conditions.
Sample frames from videos of Scenario 1, 2, and 3, are shown in Fig. 2, Fig. 3 and Fig. 4, respectively. Also, an entire video from Scenario 2 can be seen in Video 1 below.
Fig. 2: Scenario 1 Fig. 3: Scenario 2 Fig. 4: Scenario 3
A winner for each category will be announced. Participants should send binaries of their trained algorithms to the organisers, who will run each algorithm on the 300-VW test set. The participants can take part in one or more of the above-mentioned scenarios. As is the case for all such competitions, neither the landmark annotations nor the videos of the 300-VW test set will be released prior to the competition. We believe that this is the only viable way to ensure the integrity and objectivity of performance results attained in the competition. It goes without mentioning that the 300-VW Challenge organisers will not take part in the competition. The test set videos are similar in nature to those of 300-VW training set.
Fitting performance will be assessed on the same mark-up provided for the training using well-known error
measures. In particular, the average Euclidean point-to-point error normalized distance will be used [1,2]. Matlab code for calculating the error can be downloaded here. The error will be calculated over (a) all landmarks, and (b) the facial feature landmarks (eyebrows, eyes, nose, and mouth). The cumulative curve corresponding to the percentage of test images for which the error was less than a specific value will be produced. Additionally, fitting times should be recorded. Finally, these results will then be returned to the participants for inclusion in their papers. Benchmark results of a standard approach of generic face detection plus generic facial landmark detection will be used (e.g., Viola Jones plus Active Appearance Models ).
The binaries submitted for the competition will be handled confidentially. They will be used only for the scope of the competition and will be erased after the completion. The binaries should be complied in a 64bit machine and dependencies to publicly available vision repositories (such as Open CV) should be explicitly stated in the document that accompanies the binary. The submitted trackers should track with a speed of at least 2 secs/frame.
Our aim is to accept up to 10 papers to be orally presented at the workshop.
Challenge participants should submit a paper to the 300-VW Workshop, which summarizes the methodology and the achieved performance of their algorithm. Submissions should adhere to the main ICCV 2015 proceedings style. The workshop papers will be published in the ICCV 2015 proceedings. Please sign up in the submissions system to submit your paper.
Workshop Administrator: firstname.lastname@example.org
 C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, & M. Pantic, (2013, December). 300 faces in-the-wild
challenge: The first facial landmark localization challenge. In Computer Vision Workshops (ICCVW), 2013
IEEE International Conference on (pp. 397-403).
 C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, A semi-automatic methodology for facial landmark annotation, Proceedings of IEEE International Conference Computer Vision and Pattern Recognition (CVPR-W), 5th Workshop on Analysis and Modeling of Faces and Gestures (AMFG), 2013
 G. Tzimiropoulos., J. Alabort., S. Zafeiriou., and M. Pantic, “Generic active appearance models revisited,”
 R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker. “Multi-pie,” IVC, 28(5):807–813, 2010
The 300-VW Challenge & Workshop has been generously supported by Horizon 2020 SEWA project [grant agreement no. 645094] and the EPSRC project ADAManT (EP/L026813/1). SEWA project aims at building technology for human facial and vocal behaviour analysis in the wild. ADAManT project aims at building automatically personalised facial deformable models for tracking. The main coordinators of the SEWA and ADAManT projects are Prof. Maja Pantic and Dr. Stefanos Zafeiriou, respectively, two of the organisers of this challenge.