300 Videos in the Wild (300-VW) Challenge & Workshop (ICCV 2015)

Latest News

The complete 300VW dataset has been released and can be downloaded from here.
If you use the above data please cite the following papers:
- G. S. Chrysos, E. Antonakos, S. Zafeiriou and P. Snape. Offline deformable face tracking in arbitrary videos. In IEEE International Conference on Computer Vision Workshops (ICCVW), 2015. IEEE, 2015,
- J.Shen, S.Zafeiriou, G. S. Chrysos, J.Kossaifi, G.Tzimiropoulos, and M. Pantic. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In IEEE International Conference on Computer Vision Workshops (ICCVW), 2015. IEEE, 2015.
- G. Tzimiropoulos. Project-out cascaded regression with an application to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3659–3667, 2015.

300-VW

The 300 Videos in the Wild (300-VW) Facial Landmark Tracking in-the-Wild Challenge & Workshop to be held in conjunction with International Conference on Computer Vision (ICCV) 2015, Santiago, Chile.

Organisers

Chairs:

Stefanos Zafeiriou, Imperial College London, UK s.zafeiriou@imperial.ac.uk

Georgios Tzimiropoulos, University of Nottingham, UK yorgos.tzimiropoulos@nottingham.ac.uk

Maja Pantic, Imperial College London, UK m.pantic@imperial.ac.uk

Data Chairs:

Jie Shen, Imperial College London, UK js1907@imperial.ac.uk

Grigorios Chrysos, Imperial College London, UK g.chrysos@imperial.ac.uk

Jean Kossaifi, Imperial College London, UK jean.kossaifi12@imperial.ac.uk

Scope

Even though comprehensive benchmarks exist for facial landmark localization in static images, very limited effort has been made towards benchmarking facial landmark tracking in videos. In ICCV 2015, we make a significant step further and present a new comprehensive benchmark, as well as organize the first workshop-challenge for landmark tracking/detection of a set of 68 fiducial points in long-term 'in-the-wild' facial videos (duration of each video ~ 1 min). The challenge will represent the very first thorough quantitative evaluation on the topic. Furthermore, the competition will explore how far we are from attaining satisfactory facial landmark tracking results in various scenarios. The results of the Challenge will be presented at the 300 Videos in the Wild (300-VW) Workshop to be held in conjunction with ICCV 2015.

The 300-VW Competition

In order to develop a comprehensive benchmark for evaluating facial landmark tracking
algorithms in the wild, we have collected a large number of long facial videos recorded
in the wild. Each video has duration of ~1 minute (at 25-30 fps). All frames have been annotated
with regards to the same mark-up (i.e. set of facial landmarks) used in the 300 W competition
as well [1,2] (a total of 68 landmarks, please see Fig. 1).

Fig. 1: The 68 points mark-up used for our annotations.

Training

The training videos and annotations are available to download from here. Participants will be able to train their facial landmark tracking algorithms using the above training set and the data from 300W competition.

The training data folder is structured as follows:

Every clip has its own folder (001, 002, etc.).
"vid.avi" is the video file compressed using the XVID codec.
"annot" folder contains the landmark files, each pts file corresponding to one frame. The frame number starts from 1.

Testing

Participants will have their algorithms tested on the other facial videos (300-VW test set). This dataset aims at testing the ability of current systems forfitting unseen subjects, independently of variations in pose, expression, illumination, background, occlusion, and image quality.

The following three scenarios will be considered:
Scenario 1: A number of testing videos will be of people recorded in well-lit conditions displaying arbitrary expressions in various head poses (occlusions such as glasses and beards are possible but cases of occlusions by hand or another person will not be considered here). This scenario aims to evaluate algorithms that could be suitable for facial motion analysis in laboratory and naturalistic well-lit conditions.
Scenario 2: A number of testing videos will be of people recorded in unconstrained conditions (different illuminations, dark rooms, overexposed shots, etc.), displaying arbitrary expressions in various head poses but without large occlusions (occlusions such as glasses and beards are possible but cases of heavy occlusions by hand or another person will not be considered here). This scenario aims to evaluate algorithms that could be suitable for facial motion analysis in real-world human-computer interaction applications.
Scenario 3: A number of testing videos will be of people recorded in completely unconstrained conditions including the illumination conditions, occlusions, make-up, expression, head pose, etc. This scenario aims to assess the performance of facial landmark tracking in arbitrary conditions.

Sample frames from videos of Scenario 1, 2, and 3, are shown in Fig. 2, Fig. 3 and Fig. 4, respectively. Also, an entire video from Scenario 2 can be seen in Video 1 below.

Figure 2: Scenario 1 Figure 3: Scenario 2 Figure 4: Scenario 3

Fig. 2: Scenario 1 Fig. 3: Scenario 2 Fig. 4: Scenario 3

Video 1: Sample video from Scenario 2

A winner for each category will be announced. Participants should send binaries of their trained algorithms to the organisers, who will run each algorithm on the 300-VW test set. The participants can take part in one or more of the above-mentioned scenarios. As is the case for all such competitions, neither the landmark annotations nor the videos of the 300-VW test set will be released prior to the competition. We believe that this is the only viable way to ensure the integrity and objectivity of performance results attained in the competition. It goes without mentioning that the 300-VW Challenge organisers will not take part in the competition. The test set videos are similar in nature to those of 300-VW training set.

Performance Assessment

Fitting performance will be assessed on the same mark-up provided for the training using well-known error
measures. In particular, the average Euclidean point-to-point error normalized distance will be used [1,2]. Matlab code for calculating the error can be downloaded here. The error will be calculated over (a) all landmarks, and (b) the facial feature landmarks (eyebrows, eyes, nose, and mouth). The cumulative curve corresponding to the percentage of test images for which the error was less than a specific value will be produced. Additionally, fitting times should be recorded. Finally, these results will then be returned to the participants for inclusion in their papers. Benchmark results of a standard approach of generic face detection plus generic facial landmark detection will be used (e.g., Viola Jones plus Active Appearance Models [3]).

Operating characteristics

The binaries submitted for the competition will be handled confidentially. They will be used only for the scope of the competition and will be erased after the completion. The binaries should be complied in a 64bit machine and dependencies to publicly available vision repositories (such as Open CV) should be explicitly stated in the document that accompanies the binary. The submitted trackers should track with a speed of at least 2 secs/frame.

300-VW 2015 Workshop

Our aim is to accept up to 10 papers to be orally presented at the workshop.

Submission Information:

Challenge participants should submit a paper to the 300-VW Workshop, which summarizes the methodology and the achieved performance of their algorithm. Submissions should adhere to the main ICCV 2015 proceedings style. The workshop papers will be published in the ICCV 2015 proceedings. Please sign up in the submissions system to submit your paper.

Important Dates:

16 July: Announcement of the Challenge
16 July: Release of first batch of the training videos
15 September: Deadline for code submission
22 September: Results returned to the authors for inclusion in the paper
2 October: Deadline for paper submission
8 October: Final decision
16 October: Submission of the camera ready
17 Dec: Release of the complete dataset

Contact:

Workshop Administrator: 300vw.challenge@gmail.com

References

[1] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, & M. Pantic, (2013, December). 300 faces in-the-wild
challenge: The first facial landmark localization challenge. In Computer Vision Workshops (ICCVW), 2013
IEEE International Conference on (pp. 397-403).

[2] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, A semi-automatic methodology for facial landmark annotation, Proceedings of IEEE International Conference Computer Vision and Pattern Recognition (CVPR-W), 5th Workshop on Analysis and Modeling of Faces and Gestures (AMFG), 2013

[3] G. Tzimiropoulos., J. Alabort., S. Zafeiriou., and M. Pantic, “Generic active appearance models revisited,”
ACCV ’12.

[4] R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker. “Multi-pie,” IVC, 28(5):807–813, 2010

Program Committee:

Jorge Batista, University of Coimbra (Portugal)
Richard Bowden, University of Surrey (UK)
Jeff Cohn, CMU/University of Pittsburgh (USA)
Roland Goecke, University of Canberra (AU)
Peter Corcoran, NUI Galway (Ireland)
Fred Nicolls, University of Cape Town (South Africa)
Mircea C. Ionita, Daon (Ireland)
Ioannis Kakadiaris, University of Houston (USA)
Stan Z. Li, Institute of Automation Chinese Academy of Sciences (China)
Simon Lucey, CMU (USA)
Iain Matthews, Disney Research (USA)
Aleix Martinez, University of Ohio (USA)
Dimitris Metaxas, Rutgers University (USA)
Stephen Milborrow, sonic.net
Louis P. Morency, University of South California (USA)
Ioannis Patras, Queen Mary University (UK)
Matti Pietikainen, University of Oulu (Finland)
Deva Ramaman, University of Irvine (USA)
Jason Saragih, Commonwealth Sc. & Industrial Research Organisation (AU)
Nicu Sebe, University of Trento (Italy)
Jian Sun, Microsoft Research Asia
Xiaoou Tang, Chinese University of Hong Kong (China)
Fernando De La Torre, Carnegie Mellon University (USA)
Philip A. Tresadern, University of Manchester (UK)
Michel Valstar, University of Nottingham (UK)
Xiaogang Wang, Chinese University of Hong Kong (China)
Fang Wen, Microsoft Research Asia
Lijun Yin, Binghamton University (USA)

Sponsors:

The 300-VW Challenge & Workshop has been generously supported by Horizon 2020 SEWA project [grant agreement no. 645094] and the EPSRC project ADAManT (EP/L026813/1). SEWA project aims at building technology for human facial and vocal behaviour analysis in the wild. ADAManT project aims at building automatically personalised facial deformable models for tracking. The main coordinators of the SEWA and ADAManT projects are Prof. Maja Pantic and Dr. Stefanos Zafeiriou, respectively, two of the organisers of this challenge.