Aff-Wild2 database

Frames of Aff-Wild2, showing subjects of different ethnicities, age groups,  emotional states, head poses, illumination conditions and occlusions

Frames of Aff-Wild2, showing subjects of different ethnicities, age groups, emotional states, head poses, illumination conditions and occlusions

Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task.
Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a small part is manually annotated, iv) they contain a small number of subjects, or v) they are not annotated for all main behavior tasks (valencearousal estimation, action unit detection and basic expression classification).
To address these, we substantially extend the largest available in-the-wild database (Aff-Wild) to study continuous emotions such as valence and arousal. Furthermore, we annotate parts of the database with basic expressions and action units. We call this database Aff-Wild2. In total, Aff-Wild2 contains 558 videos with around 2.8 million frames. To the best of our knowledge, AffWild2 is the first large scale in-the-wild database containing annotations for all 3 main behavior tasks. It is also the first audiovisual database with annotations for AUs. All AU annotated databases do not contain audio, but only images or videos.



How to acquire the data


  • For the Valence-Arousal (VA) Set:     the training and validation videos and annotations can be found here. In more detail:   the training videos can be found here, the validation videos here and the annotations for these sets here. The corresponding test set videos can be found here. Note that the test set annotations will not be made publicly available as we are currently running a Competition and a Worskhop in IEEE International Conference on Automatic Face & Gesture Recognition (FG 2020).
    .
  • For the Action Unit (AU) Set:     the training, validation and test videos and annotations can be found hereNote again that the test set annotations will not be made publicly available as we are currently running a Competition and a Worskhop in IEEE International Conference on Automatic Face & Gesture Recognition (FG 2020).
  • For the Expression (Expr) Set:     the training, validation and test videos and annotations can be found hereNote again that the test set annotations will not be made publicly available as we are currently running a Competition and a Worskhop in IEEE International Conference on Automatic Face & Gesture Recognition (FG 2020).

 

README for the Data:

 

Each video file name is in the form of:

  1. a 3-digit number followed by the extension .mp4 or .avi , such as 137.avi : these videos are the videos of the (former) Aff-Wild
  2. video{id}.mp4 , such as video5.mp4  : id is an integer number
  3. video{id}_number1.mp4 , such as video86_2.mp4  : id and number1 are integer numbers
  4. number1-number2-number3xnumber4.mp4, such as 140-30-632x360.mp4: number1 and number2 are random numbers, number3 and number4 correspond to the dimensions of the video
  5. number1-number2-number3xnumber4-number5.mp4 , such as 5-60-1920x1080-4.mp4: in such cases we had at first a video in the format described in (4) that we split into smaller segments; number5 shows the position of the specific segment (so all such videos show the same subject: videos 5-60-1920x1080-1.mp4, 5-60-1920x1080-2.mp4, 5-60-1920x1080-3.mp4 and 5-60-1920x1080-4.mp4 all show the same subject



VA Set:

 

Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always: valence,arousal
Each line after the first, shows the valence and arousal values separated by a comma. Note that valence and arousal take values in the range: [-1,1].
In some annotation files, in some lines, there exist the values -5,-5 which mean that this frame is not annotated with a valence, arousal value and you should disregard such frames.
There exist some videos (such as 30-30-1920x1080.mp4) in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right'. For instance, for video named 30-30-1920x1080.mp4 the two annotation files are 30-30-1920x1080_left.txt and 30-30-1920x1080_right.txt  (corresponding to the left and right persons respectively; in all these videos, people have the same order, meaning that the one person is always on the left and thus the annotation file with extension _left always annotates this person and same for the person on the right). We also make a special mention (exception) for videos named: i) 10-60-1280x720.mp4 (found in the train set), ii) video59.mp4 (found in the validation set) and iii) video2.mp4 (found in the test set). These videos have the annotation files named:  i) 10-60-1280x720.txt and 10-60-1280x720_right.txt, ii) video59.txt and video59_right.txt and iii) video2.txt and video2_left.txt. In these cases, the videos contain one (main) subject; for small durations of time the camera is showing another person (coming from the left or right side), who is annotated with the extension  '_left' or '_right' as mentioned in (i)-(iii) above.  

 

AU Set:

 

Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always:
AU1,AU2,AU4,AU6,AU12,AU15,AU20,AU25

Each line after the first, shows the AU annotation values separated by a comma. Note that each value could be 0 or 1 or -1. Frames annotated with the -1 value, should be discarded .
Again, there exist some videos in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right' as explained in the VA case above. The special mention (exception) here is the video named video59.mp4 which has the annotation files video59.txt and video59_right.txt. 

 

Expression Set:

 

Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always: 
Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
Each line after the first has one annotation value in: {0,1,2,3,4,5,6}. These values correspond to the emotions: {Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise}. The annotation value can also be -1 (and you need to disregard). 
Again, there exist some videos in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right' as explained in the VA and AU cases above.

 

If you have any question or experience any problem with the database or you want to evaluate your models on the Aff-Wild2's test sets, send an email to:  dimitrios.kollias15@imperial.ac.uk



References:


If you use the above data, you must cite all following papers: 


  • D. Kollias, S. Zafeiriou: "Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace". BMVC, 2019

@article{kollias2019expression, title={Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.04855}, year={2019} }


  • D. Kollias, S. Zafeiriou: "Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition", 2018

@article{kollias2018aff, title={Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1811.07770}, year={2018} }


  • D. Kollias, S. Zafeiriou: "A Multi-Task Learning \& Generation Framework: Valence-Arousal, Action Units \& Primary Expressions", 2018

@article{kollias2018multi, title={A Multi-Task Learning \& Generation Framework: Valence-Arousal, Action Units \& Primary Expressions}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1811.07771}, year={2018} }

 

  • D. Kollias, et. al.: "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision (IJCV), 2019

@article{kollias2019deep, title={Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond}, author={Kollias, Dimitrios and Tzirakis, Panagiotis and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Schuller, Bj{\"o}rn and Kotsia, Irene and Zafeiriou, Stefanos}, journal={International Journal of Computer Vision}, pages={1--23}, year={2019}, publisher={Springer} }


  • S. Zafeiriou, et. al. "Aff-Wild: Valence and Arousal in-the-wild Challenge", CVPRW, 2017

@inproceedings{zafeiriou2017aff, title={Aff-wild: Valence and arousal ‘in-the-wild’challenge}, author={Zafeiriou, Stefanos and Kollias, Dimitrios and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Kotsia, Irene}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1980--1987}, year={2017}, organization={IEEE} }


  • D. Kollias, et. al. "Recognition of affect in the wild using deep neural networks", CVPRW, 2017

@inproceedings{kollias2017recognition, title={Recognition of affect in the wild using deep neural networks}, author={Kollias, Dimitrios and Nicolaou, Mihalis A and Kotsia, Irene and Zhao, Guoying and Zafeiriou, Stefanos}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1972--1979}, year={2017}, organization={IEEE} }



Evaluation of your predictions on the test set


First of all, you should clarify to which set (VA, AU, Expression) the predictions correspond. The format of the predictions should follow the (same) format of the annotation files that we provide. In detail:

In the VA case:  Send the files with names as the corresponding videos. Each line of each file should contain the values of valence and arousal for the corresponding frame separated by comma ,i.e. for file 271.csv:

line 1 should be: valence,arousal
line 2 should be: valence_of_first_frame,arousal_of_first_frame     (for instance it could be: 0.53,0.28)
line 3 should be: valence_of_second_frame,arousal_of_second_frame
...
last line: valence_of_last_frame,arousal_of_last_frame

 

In the Expression case:  Send the files with names as the corresponding videos. Each line of each file should contain the corresponding basic expression prediction (0,1,2,3,4,5,6, where: 0 denotes neutral, 1 denotes anger, 2 denotes disgust, 3 denotes fear, 4 denotes happiness, 5 denotes sadness and 6 denotes surprise). For instance for file 282.csv:

first line should be: Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
second line should be: basic_expression_prediction_of_first_frame    (such as 5)
...
last line should be: basic_expression_prediction_of_last_frame    


In the AU case:  Send the files with names as the corresponding videos. Each line of each file should contain 8 numbers (0 or 1) comma separated, that correspond to the 8 Action Units (AU1, AU2, AU4, AU6, AU12, AU15, AU20, AU25). For instance for file video18.csv:

first line should be: AU1,AU2,AU4,AU6,AU12,AU15,AU20,AU25
second line should be: AU1_of_first_frame,AU2_of_first_frame,AU4_of_first_frame,AU6_of_first_frame,AU12_of_first_frame,AU15_of_first_frame,AU20_of_first_frame,AU25_of_first_frame    (such as: 0,1,1,0,0,0,0,1)
...
last line should be: AU1_of_last_frame,AU2_of_last_frame,AU4_of_last_frame,AU6_of_last_frame,AU12_of_last_frame,AU15_of_last_frame,AU20_of_last_frame,AU25_of_last_frame      

Note that in your files you should include predictions for all frames in the video (irregardless if the bounding box failed or not).

 

 

Trained Models - Source Code

 

As currently we are organizing: i) a Competition (split into 3 Tracks-Challenges) and ii) a related Workshop, in IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), trained models and source code will be made publicly available after these are completed. 

 

 

 

Important Information:

  • The  dataset and annotations are available for non-commercial research purposes only.
  • All the training/validation/testing images of the dataset are obtained from Youtube. We are not responsible for the content nor the meaning of these images.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
  • You agree not to further copy, publish or distribute any portion of  annotations of the dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.
  • We reserve the right to terminate your access to the dataset at any time.