The first Automatic Facial Landmark Detection in-the-Wild Challenge (300-W 2013) to be held in conjunction with International Conference on Computer Vision 2013, Sydney, Australia.
Georgios Tzimiropoulos, University of Lincoln, UK
Stefanos Zafeiriou, Imperial College London, UK
Maja Pantic, Imperial College London, UK
Automatic facial landmark detection is a longstanding problem in computer vision, and 300-W Challenge is the first event of its kind organized exclusively to benchmark the efforts in the field. The particular focus is on facial landmark detection in real-world datasets of facial images captured in-the-wild. The results of the Challenge will be presented at the 300-W Faces in-the-Wild Workshop to be held in conjunction with ICCV 2013.
A special issue of Image and Vision Computing Journal will present the best performing methods and summarize the results of the Challenge.
The 300-W Challenge
Landmark annotations (following the Multi-PIE  68 points mark-up, please see Fig. 1) for four popular data sets are available from here. All participants in the Challenge will be able to train their algorithms using these data. Performance evaluation will be carried out on 300-W test set, using the same Multi-PIE mark-up, and the same face-bounding box initialization.
Figure 1: The 68 and 51 points mark-up used for our annotations.
The datasets LFPW , AFW , HELEN , and XM2VTS  have been re-annotated using the mark-up of Fig 1. We provide additional annotations for another 135 images in difficult poses and expressions (IBUG training set). Annotations have the same name as the corresponding images. For LFPW, AFW, HELEN, and IBUG datasets we also provide the images. The remaining image databases can be downloaded from the authors’ websites. All annotations can be downloaded from here.
Participants are strongly encouraged to train their algorithms using these training data. Should you use any of the provided annotations please cite  and the paper presenting the corresponding database.
Please note that the re-annotated data for this challenge are saved in the matlab convention of 1 being
the first index, i.e. the coordinates of the top left pixel in an image are x=1, y=1.
Participants will have their algorithms tested on a newly collected data set with 2x300 (300 indoor and 300 outdoor) face images collected in the wild (300-W test set). Sample images are shown in Fig 2 and Fig 3.
Figure 2: Outdoor.
Figure 3: Indoor.
300-W test set is aimed to test the ability of current systems to handle unseen subjects, independently of variations in pose, expression, illumination, background, occlusion, and image quality.
Participants should send binaries with their trained algorithms to the organisers, who will run each algorithm on the 300-W test set using the same bounding box initialization. This bounding box is provided by our in-house face detector. The face region that our detector was trained on is defined by the bounding box as computed by the landmark annotations (please see Fig. 4).
Figure 4: Face region (bounding box) that our face detector was trained on.
Examples of bounding box initialisations along with the ground-truth bounding boxes are show in Fig. 5. We provide the bounding box initialisations, as produced by our in-house detector, for each database of the training procedure. Additionaly the bounding boxes of the ground truth are given.
Figure 5: Examples of bounding box initialisations for images from the test set of LFPW.
Participants should expect that initialisations for the 300-W test set are of similar accuracy.
Each binary should accept two inputs: input image (RGB with .png extension) and the coordinates of the bounding box. Bounding box should be a 4x1 vector [xmin, ymin, xmax, ymax] (please see Fig. 6). The output of the binary should be a 68 x 2 matrix with the detected landmarks. This matrix should be saved in the same format (.pts) and ordering as the one of the provided annotations.
Figure 6: Coordinates of the bounding box (the coordinates of the top left pixel are x=1, y=1).
Facial landmark detection performance will be assessed on both the 68 points mark-up of Fig 1 and the 51 points which correspond to the points without border (please see Fig1). The average point-to-point Euclidean error normalized by the inter-ocular distance (measured as the Euclidean distance between the outer corners of the eyes) will be used as the error measure. Matlab code for calculating the error can be downloaded from http://ibug.doc.ic.ac.uk/media/uploads/competitions/compute_error.m. Finally, the cumulative curve corresponding to the percentage of test images for which the error was less than a specific value will be produced. Additionally, fitting times will be recorded. These results will be returned to the participants for inclusion in their papers.
The binaries submitted for the competition will be handled confidentially. They will be used only for the scope of the competition and will be erased after the completion. The binaries should be complied in a 64bit machine and dependencies to publicly available vision repositories (such as Open CV) should be explicitly stated in the document that accompanies the binary
|51 points||68 points|
|51 points||68 points|
Indoor + Outdoor
|51 points||68 points|
1. S. Milborrow, T. Bishop, and F. Nicolls. Multiview active shape models with sift descriptors for the 300-w face landmark challenge.
2. S. Jaiswal, T. Almaev, and M. Valstar. Guided unsupervised learning of mode specific models for facial point detection in the wild.
3. T. Baltrusaitis, L.-P. Morency, and P. Robinson. Constrained local neural fields for robust facial landmark detection in the wild.
4. E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Facial landmark localization with coarse-to-fine convolutional network cascade.
5. K. Hasan Md., S. Moalem, and C. Pal. Localizing facial keypoints with global descriptor search, neighbour alignment and locally linear models.
6. J. Yan, Z. Lei, D. Yi, and S. Z. Li. Learn to combine multiple hypotheses for face alignment.
Challenge participants should submit a paper to the 300-W Workshop, which summarizes the methodology and the achieved performance of their algorithm. Submissions should adhere to the main ICCV 2013 proceedings style, and have a maximum length of 8 pages and will be charged a fee if $200, regardless of length. The workshop papers will be published in the ICCV 2013 proceedings. Please sign up in the submissions system to submit your paper.
Dr. Georgios Tzimiropoulos
Intelligent Behaviour Understanding Group (iBUG)
 R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker.Multi-pie. Image and Vision Computing, 28(5):807–813, 2010.
 Belhumeur, P., Jacobs, D., Kriegman, D., Kumar, N.. ‘Localizing parts of faces using a consensus of exemplars’. In Computer Vision and Pattern Recognition, CVPR. (2011).
 X. Zhu, D. Ramanan. ‘Face detection, pose estimation and landmark localization in the wild’, Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012.
 Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir Boudev, Thomas S. Huang. ‘Interactive Facial Feature Localization’, ECCV2012.
 Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G. ‘Xm2vtsdb: The ex- tended m2vts database’. In: 2nd international conference on audio and video-based biometric person authentication. Volume 964. (1999).
 C. Sagonas, G. Tzimiropoulos, S. Zafeiriou and Maja Pantic. ‘A semi-automatic methodology for facial landmark annotation’, IEEE Int’l Conf. Computer Vision and Pattern Recognition (CVPR-W’13), 5th Workshop on Analysis and Modeling of Faces and Gestures (AMFG2013). Portland Oregon, USA, June 2013 (accepted for publication).