Datasets
Code
Frames of Aff-Wild2, showing subjects of different ethnicities, age groups, emotional states, head poses, illumination conditions and occlusions
Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task.
Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a small part is manually annotated, iv) they contain a small number of subjects, or v) they are not annotated for all main behavior tasks (valencearousal estimation, action unit detection and basic expression classification).
To address these, we substantially extend the largest available in-the-wild database (Aff-Wild) to study continuous emotions such as valence and arousal. Furthermore, we annotate parts of the database with basic expressions and action units. We call this database Aff-Wild2. In total, Aff-Wild2 contains 558 videos with around 2.8 million frames. To the best of our knowledge, AffWild2 is the first large scale in-the-wild database containing annotations for all 3 main behavior tasks. It is also the first audiovisual database with annotations for AUs. All AU annotated databases do not contain audio, but only images or videos.
Latest News
We are currently organizing a Competition (split into 3 Tracks-Challenges) on an updated version of Aff-Wild2 (augmented with more videos). Please check here for more information, how to acquire the data and to participate. Also you can email: dimitrios.kollias15@imperial.ac.uk
Each video file name is in the form of:
Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always: valence,arousal
Each line after the first, shows the valence and arousal values separated by a comma. Note that valence and arousal take values in the range: [-1,1].
In some annotation files, in some lines, there exist the values -5,-5 which mean that this frame is not annotated with a valence, arousal value and you should disregard such frames.
There exist some videos (such as 30-30-1920x1080.mp4) in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right'. For instance, for video named 30-30-1920x1080.mp4 the two annotation files are 30-30-1920x1080_left.txt and 30-30-1920x1080_right.txt (corresponding to the left and right persons respectively; in all these videos, people have the same order, meaning that the one person is always on the left and thus the annotation file with extension _left always annotates this person and same for the person on the right). We also make a special mention (exception) for videos named: i) 10-60-1280x720.mp4 (found in the train set), ii) video59.mp4 (found in the validation set) and iii) video2.mp4 (found in the test set). These videos have the annotation files named: i) 10-60-1280x720.txt and 10-60-1280x720_right.txt, ii) video59.txt and video59_right.txt and iii) video2.txt and video2_left.txt. In these cases, the videos contain one (main) subject; for small durations of time the camera is showing another person (coming from the left or right side), who is annotated with the extension '_left' or '_right' as mentioned in (i)-(iii) above.
Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always:
AU1,AU2,AU4,AU6,AU12,AU15,AU20,AU25
Each line after the first, shows the AU annotation values separated by a comma. Note that each value could be 0 or 1 or -1. Frames annotated with the -1 value, should be discarded .
Again, there exist some videos in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right' as explained in the VA case above. The special mention (exception) here is the video named video59.mp4 which has the annotation files video59.txt and video59_right.txt.
Each annotation file is named after the video that it corresponds to. The first line of each annotation file is always:
Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
Each line after the first has one annotation value in: {0,1,2,3,4,5,6}. These values correspond to the emotions: {Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise}. The annotation value can also be -1 (and you need to disregard).
Again, there exist some videos in which there exist two subjects reacting to what they see, etc. Both subjects have been annotated and the corresponding annotation files are distinguished with the endings-extensions '_left' and '_right' as explained in the VA and AU cases above.
If you have any question or experience any problem with the database or you want to evaluate your models on the Aff-Wild2's test sets, send an email to: dimitrios.kollias15@imperial.ac.uk
If you use the above data, you must cite all following papers:
@article{kollias2019expression, title={Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.04855}, year={2019} }
@article{kollias2018aff, title={Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1811.07770}, year={2018} }
@article{kollias2018multi, title={A Multi-Task Learning \& Generation Framework: Valence-Arousal, Action Units \& Primary Expressions}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1811.07771}, year={2018} }
@article{kollias2019deep, title={Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond}, author={Kollias, Dimitrios and Tzirakis, Panagiotis and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Schuller, Bj{\"o}rn and Kotsia, Irene and Zafeiriou, Stefanos}, journal={International Journal of Computer Vision}, pages={1--23}, year={2019}, publisher={Springer} }
@inproceedings{zafeiriou2017aff, title={Aff-wild: Valence and arousal ‘in-the-wild’challenge}, author={Zafeiriou, Stefanos and Kollias, Dimitrios and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Kotsia, Irene}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1980--1987}, year={2017}, organization={IEEE} }
@inproceedings{kollias2017recognition, title={Recognition of affect in the wild using deep neural networks}, author={Kollias, Dimitrios and Nicolaou, Mihalis A and Kotsia, Irene and Zhao, Guoying and Zafeiriou, Stefanos}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1972--1979}, year={2017}, organization={IEEE} }
First of all, you should clarify to which set (VA, AU, Expression) the predictions correspond. The format of the predictions should follow the (same) format of the annotation files that we provide. In detail:
In the VA case: Send the files with names as the corresponding videos. Each line of each file should contain the values of valence and arousal for the corresponding frame separated by comma ,i.e. for file 271.csv:
line 1 should be: valence,arousal
line 2 should be: valence_of_first_frame,arousal_of_first_frame (for instance it could be: 0.53,0.28)
line 3 should be: valence_of_second_frame,arousal_of_second_frame
...
last line: valence_of_last_frame,arousal_of_last_frame
In the Expression case: Send the files with names as the corresponding videos. Each line of each file should contain the corresponding basic expression prediction (0,1,2,3,4,5,6, where: 0 denotes neutral, 1 denotes anger, 2 denotes disgust, 3 denotes fear, 4 denotes happiness, 5 denotes sadness and 6 denotes surprise). For instance for file 282.csv:
first line should be: Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
second line should be: basic_expression_prediction_of_first_frame (such as 5)
...
last line should be: basic_expression_prediction_of_last_frame
In the AU case: Send the files with names as the corresponding videos. Each line of each file should contain 8 numbers (0 or 1) comma separated, that correspond to the 8 Action Units (AU1, AU2, AU4, AU6, AU12, AU15, AU20, AU25). For instance for file video18.csv:
first line should be: AU1,AU2,AU4,AU6,AU12,AU15,AU20,AU25
second line should be: AU1_of_first_frame,AU2_of_first_frame,AU4_of_first_frame,AU6_of_first_frame,AU12_of_first_frame,AU15_of_first_frame,AU20_of_first_frame,AU25_of_first_frame (such as: 0,1,1,0,0,0,0,1)
...
last line should be: AU1_of_last_frame,AU2_of_last_frame,AU4_of_last_frame,AU6_of_last_frame,AU12_of_last_frame,AU15_of_last_frame,AU20_of_last_frame,AU25_of_last_frame
Note that in your files you should include predictions for all frames in the video (irregardless if the bounding box failed or not).
As currently we are organizing: i) a Competition (split into 3 Tracks-Challenges) and ii) a related Workshop, in IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), trained models and source code will be made publicly available after these are completed.