ICCV 2021: 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)

Latest News

The 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), will be held in conjunction with the International Conference on Computer Vision (ICCV) 2021.


For any requests or enquiries, please contact: D.Kollias@greenwich.ac.uk 



Dimitrios Kollias, University of Greenwich, UK                                  D.Kollias@greenwich.ac.uk         

Stefanos Zafeiriou, Imperial College London, UK                              s.zafeiriou@imperial.ac.uk

Irene Kotsia, Middlesex University London, UK                                 i.kotsia@mdx.ac.uk

Attila Schulc,  Realeyes  - Emotional Intelligence                              attila.schulc@realeyesit.com   



This Workshop tackles the problem of affective behavior analysis in-the-wild, that is a major targeted characteristic of HCI systems used in real life applications. The target is to create machines and robots that are capable of understanding people's feelings, emotions and behaviors; thus, being able to interact in a 'human-centered' and engaging manner with them, and effectively serving them as their digital assistants. This interaction should not be dependent on the respective context, nor the human's age, sex, ethnicity, educational level, profession, or social position. As a result, the development of intelligent systems able to analyze human behaviors in-the-wild can contribute to generation of trust, understanding and closeness between humans and machines in real life environments.


Representing human emotions has been a basic topic of research. The most frequently used emotion representation is the categorical one, including the seven basic categories, i.e., Anger, Disgust, Fear, Happiness, Sadness, Surprise and Neutral. Discrete emotion representation can also be described in terms of the Facial Action Coding System model, in which all possible facial actions are described in terms of Action Units. Finally, the dimensional model of affect has been proposed as a means to distinguish between subtly different displays of affect and encode small changes in the intensity of each emotion on a continuous scale. The 2-D Valence and Arousal Space (VA-Space) is the most usual dimensional emotion representation; valence shows how positive or negative an emotional state is, whilst arousal shows how passive or active it is.


To this end, the developed systems should automatically sense and interpret facial and audio-visual signals relevant to emotions, traits, appraisals and intentions. Furthermore, since real-world settings entail uncontrolled conditions, where subjects operate in a diversity of contexts and environments, systems that perform automatic analysis of human behavior and emotion recognition should be robust to video recording conditions, diversity of contexts and timing of display.

These goals are scientifically and technically challenging.


Call for participation: 

At first, this Workshop hosts a Competetion, which is split into 3 Challenges. More details can be found in the relevant section below.

Apart from the Competition, this Workshop will solicit contributions on the recent progress of recognition, analysis, generation and modelling of face, body, and gesture, while embracing the most advanced systems available for face and gesture analysis, particularly, in-the-wild (i.e., in unconstrained environments) and across modalities like face to voice.

Original high-quality contributions, including:

- databases or

- surveys and comparative studies or

- Artificial Intelligence / Machine Learning / Deep Learning / AutoML / (Data-driven or physics-based) Generative

Modelling Methodologies (either Uni-Modal or Multi-Modal ones)

are solicited on the following topics:

i) "in-the-wild" facial expression or micro-expression analysis,

ii) "in-the-wild" facial action unit detection,

iii) "in-the-wild" valence-arousal estimation,

iv) "in-the-wild" physiological-based (e.g.,EEG, EDA) affect analysis,

v) domain adaptation for affect recognition in the previous 4 cases

vi) "in-the-wild" face recognition, detection or tracking,

vii) "in-the-wild" body recognition, detection or tracking,

viii) "in-the-wild" gesture recognition or detection,

ix) "in-the-wild" pose estimation or tracking,

x) "in-the-wild" activity recognition or tracking,

xi) "in-the-wild" lip reading and voice understanding,

xii) "in-the-wild" face and body characterization (e.g., behavioral understanding),

xiii) "in-the-wild" characteristic analysis (e.g., gait, age, gender, ethnicity recognition),

xiv) "in-the-wild" group understanding via social cues (e.g., kinship, non-blood relationships, personality) 


Accepted papers will appear at ICCV 2021 proceedings.


Workshop Important Dates: 

  • Paper Submission Deadline:              
    18 July, 2021
  • Review decisions sent to authors; Notification of acceptance:
    7 August, 2021
  • Camera ready version:
    17 August, 2021



Sumbission Information

The paper format should adhere to the paper submission guidelines for main ICCV 2021 proceedings style

The submission process will be handled through CMT.





The Competition


The Competition is a continuation of the ABAW Competition held last year in IEEE FG. It is split into 3 Challenges-Tracks, which are based on the same database; these target: dimensional and categorical affect recognition.

In particular, the 3 Challenges-Tracks are: 

  • valence-arousal estimation
  • seven basic expression classification
  • facial action unit detection

These Challenges will produce a significant step forward when compared to previous events. In particular, they use the Aff-Wild2, the first comprehensive benchmark for all three affect recognition tasks in-the-wild. 

Participants are invited to participate in one or more of these Challenges.

There will be one winner per Challenge-Track. The winners are expected to contribute a paper describing their approach, methodology and results; the accepted winning papers will be part of the ICCV 2021 proceedings. All other teams are also able to submit a paper describing their solutions and final results; the accepted papers will be part of the ICCV 2021 proceedings.

For the purpose of the Challenges and to facilitate training, especially for people that do not have access to face detectors/tracking algorithms, we provide the cropped images and the cropped & aligned ones.



Data: Aff-Wild2  


Aff-Wild2 is an extension of the Aff-Wild database (both in terms of annotations and videos).

Aff-Wild2 is: i) an in-the-wild audiovisual database;  ii) a large scale database consisting of 564 videos of around 2.8M  frames (the largest existing one); iii) the first database to contain annotations for all 3 behavior tasks (and also the first audiovisual database with annotations for AUs). 558 videos contain annotations for valence- arousal, 539 videos contain annotations for the 7 basic expressions and 558 videos contain annotations for 17 AUs (AU1,AU2,AU4,AU5,AU6,AU7,AU9,AU10,AU11,AU12,AU15,AU17,AU20,AU23,AU24,AU25,AU26).

How to participate


To participate, you need to register your team.

For this, please send us an email to: D.Kollias@greenwich.ac.uk  with the title "2nd Affective Behavior Analysis in-the-wild Competition: Team Registration".

In this email include the following information:

Team Name

Team Members


There is no maximum number of participants in each team.

As a reply, you will receive access to the dataset's videos, annotations, cropped and cropped-aligned images and other important information.


At the end of the Challenges, each team will have to send us: i) their predictions on the test set, ii) a link to a Github repository where their solution/source code will be stored, and iii) a link to an ArXiv paper with 2-6 pages describing their proposed methodology, data used and results. After that, the winner of each Challenge will be announced and will be invited to submit a paper describing the solution and results. Also all (non-winning) teams will be able to submit a paper describing their solutions and final results to our Workshop.




• Participants can contribute to any of the 3 Challenges.

• In order to take part in any Challenge, participants will have to register by sending an email to the organizers containing the following information: Team Name, Team Members, Affiliation.

• Participants can use scene/background/body pose etc. information along with the face information.

• Any face detector whether commercial or academic can be used in the challenge. The paper accompanying the challenge result submission should contain clear details of the detectors/libraries used.

• The participants are free to use external data for training along with the Aff-Wild2 partitions. However, this should be clearly discussed in the accompanying paper

• The participants are free to use any pre-trained network, even the publicly available ones (CNN, AffWildNet) that displayed the best performance in the (former) Aff-Wild database (part of Aff-Wild2). 


Performance Assessment


1)  For Challenge-Track 1: Valence-Arousal estimation :

the Concordance Correlation Coefficient (CCC) will be the metric to judge the performance of the models.


2) For Challenge-Track 2: 7 Basic Expression Classification, the perfromance metric will be: 

0.67* F1_Score + 0.33* Accuracy

Note: F1 Score is the unweighted mean and Accuracy is the total accuracy


3) For Challenge-Track 3: 17 Action Unit Detection, the perfromance metric will be:

0.5* F1_Score + 0.5* Accuracy

Note: F1 Score is the unweighted mean and Accuracy is the total accuracy



Test Set Submissions:

Participating teams are allowed to have at most 7 different submissions per Challenge-Track. A submission is considered valid if we receive the code, the paper and the results; so if you fail to submit one of these items, your submission will be invalid.


When sending your final results, make sure to clarify to which Challenge-Track they correspond by for example storing them into a folder named after the Challenge-Track.


The format of the predictions should follow the (same) format of the annotation files that we provided. So if the test set contains for instance 200 videos, the submission should also contain 200 text files (or something more if some videos contain two subjects). The names of the files should match the ones that are in the attached, files.

Each file should contain in its first line the above (as was the case with the annotation files), depending on which Challenge-Track it corresponds to:

  • valence,arousal
  • Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
  • AU1,AU2,AU4,AU5,AU6,AU7,AU9,AU10,AU11,AU12,AU15,AU17,AU20,AU23,AU24,AU25,AU26

After that, each line should contain the predictions corresponding to each video frame. 

For the VA Challenge-Track, each following line should have the valence and arousal values (first the valence value and then the arousal) comma separated (as was the case in the annotation files), such as:



For the AU Challenge-Track, each following line should have the 17 action unit values comma separated (as was the case in the annotation files), such as:




For the Expr Challenge-Track, each following line should have the expression value (as was the case in the annotation files), which is in:

{0,1,2,3,4,5,6} (which correspond to the emotions {Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise}). 

So for instance one line could be:




Note that in your files you should include predictions for all frames in the video (regardless if the bounding box failed or not). So the total number of lines in a file should be equal to the total number of frames of this video plus one (we previously stated the format of the first line of each file).



Regarding the database:


• The database and annotations are available for non-commercial research purposes only. 

• All the training/validation/testing images of the dataset have been obtained from Youtube. We are not responsible for the content nor the meaning of these images.

• Participants will agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data. They will also agree not to further copy, publish or distribute any portion of annotations of the dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

• We reserve the right to terminate participants’ access to the dataset at any time.

• If a participant’s face is displayed in any video and (s)he wants it to be removed, (s)he can email us at any time


Important Dates (updated): 

  • Call for participation announced, team registration begins, data available:       

  3 May, 2021

  • Final submission deadline (Results, Code and ArXiv paper):

  4 July, 2021

  • Winners Announcement:      

  5 July, 2021

  • Final paper submission deadline:                       

18 July, 2021

  • Review decisions sent to authors; Notification of acceptance:                     

    7 August, 2021

  • Camera ready version deadline:                                                                   

 17 August, 2021