ICCV 2021: 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)

Latest News

The 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), will be held in conjunction with the International Conference on Computer Vision (ICCV) 2021.


For any requests or enquiries, please contact: D.Kollias@greenwich.ac.uk 



Dimitrios Kollias, University of Greenwich, UK                                  D.Kollias@greenwich.ac.uk         

Stefanos Zafeiriou, Imperial College London, UK                              s.zafeiriou@imperial.ac.uk

Irene Kotsia, Middlesex University London, UK                                 i.kotsia@mdx.ac.uk

Elnar Hajiyev,  Realeyes  - Emotional Intelligence                             elnar@realeyesit.com    



This Workshop tackles the problem of affective behavior analysis in-the-wild, that is a major targeted characteristic of HCI systems used in real life applications. The target is to create machines and robots that are capable of understanding people's feelings, emotions and behaviors; thus, being able to interact in a 'human-centered' and engaging manner with them, and effectively serving them as their digital assistants. This interaction should not be dependent on the respective context, nor the human's age, sex, ethnicity, educational level, profession, or social position. As a result, the development of intelligent systems able to analyze human behaviors in-the-wild can contribute to generation of trust, understanding and closeness between humans and machines in real life environments.


Representing human emotions has been a basic topic of research. The most frequently used emotion representation is the categorical one, including the seven basic categories, i.e., Anger, Disgust, Fear, Happiness, Sadness, Surprise and Neutral. Discrete emotion representation can also be described in terms of the Facial Action Coding System model, in which all possible facial actions are described in terms of Action Units. Finally, the dimensional model of affect has been proposed as a means to distinguish between subtly different displays of affect and encode small changes in the intensity of each emotion on a continuous scale. The 2-D Valence and Arousal Space (VA-Space) is the most usual dimensional emotion representation; valence shows how positive or negative an emotional state is, whilst arousal shows how passive or active it is.


To this end, the developed systems should automatically sense and interpret facial and audio-visual signals relevant to emotions, traits, appraisals and intentions. Furthermore, since real-world settings entail uncontrolled conditions, where subjects operate in a diversity of contexts and environments, systems that perform automatic analysis of human behavior and emotion recognition should be robust to video recording conditions, diversity of contexts and timing of display.

These goals are scientifically and technically challenging.


Call for participation: 

At first, this Workshop hosts a Competetion, which is split into 3 Challenges. More details can be found in the relevant section below.

Apart from the Competition, this Workshop will solicit contributions on the recent progress of recognition, analysis, generation and modelling of face, body, and gesture, while embracing the most advanced systems available for face and gesture analysis, particularly, in-the-wild (i.e., in unconstrained environments) and across modalities like face to voice.

Original high-quality contributions, including:

- databases or

- surveys and comparative studies or

- Artificial Intelligence / Machine Learning / Deep Learning / AutoML / (Data-driven or physics-based) Generative

Modelling Methodologies (either Uni-Modal or Multi-Modal ones)

are solicited on the following topics:

i) "in-the-wild" facial expression or micro-expression analysis,

ii) "in-the-wild" facial action unit detection,

iii) "in-the-wild" valence-arousal estimation,

iv) "in-the-wild" physiological-based (e.g.,EEG, EDA) affect analysis,

v) domain adaptation for affect recognition in the previous 4 cases

vi) "in-the-wild" face recognition, detection or tracking,

vii) "in-the-wild" body recognition, detection or tracking,

viii) "in-the-wild" gesture recognition or detection,

ix) "in-the-wild" pose estimation or tracking,

x) "in-the-wild" activity recognition or tracking,

xi) "in-the-wild" lip reading and voice understanding,

xii) "in-the-wild" face and body characterization (e.g., behavioral understanding),

xiii) "in-the-wild" characteristic analysis (e.g., gait, age, gender, ethnicity recognition),

xiv) "in-the-wild" group understanding via social cues (e.g., kinship, non-blood relationships, personality) 


Accepted papers will appear at ICCV 2021 proceedings.


Workshop Important Dates: 

  • Paper Submission Deadline:              

    29 July, 2021   (UPDATED)

  • Review decisions sent to authors; Notification of acceptance:
    10 August, 2021
  • Camera ready version:
    17 August, 2021



Sumbission Information

The paper format should adhere to the paper submission guidelines for main ICCV 2021 proceedings stylePlease have a look at the: Submission Guidelines Section 

The submission process will be handled through CMT.

All accepted manuscripts will be part of ICCV 2021 conference proceedings. 




The Competition


The Competition is a continuation of the ABAW Competition held last year in IEEE FG. It is split into 3 Challenges-Tracks, which are based on the same database; these target: dimensional and categorical affect recognition.

In particular, the 3 Challenges-Tracks are: 

  • valence-arousal estimation
  • seven basic expression classification
  • facial action unit detection

These Challenges will produce a significant step forward when compared to previous events. In particular, they use the Aff-Wild2, the first comprehensive benchmark for all three affect recognition tasks in-the-wild. 

Participants are invited to participate in one or more of these Challenges.

There will be one winner per Challenge-Track. The winners are expected to contribute a paper describing their approach, methodology and results; the accepted winning papers will be part of the ICCV 2021 proceedings. All other teams are also able to submit a paper describing their solutions and final results; the accepted papers will be part of the ICCV 2021 proceedings.

For the purpose of the Challenges and to facilitate training, especially for people that do not have access to face detectors/tracking algorithms, we provide the cropped images and the cropped & aligned ones.



Data: Aff-Wild2  


Aff-Wild2 is an extension of the Aff-Wild database (both in terms of annotations and videos).

Aff-Wild2 is: i) an in-the-wild audiovisual database;  ii) a large scale database consisting of 564 videos of around 2.8M  frames (the largest existing one); iii) the first database to contain annotations for all 3 behavior tasks (and also the first audiovisual database with annotations for AUs). 561 videos contain annotations for valence- arousal, 546 videos contain annotations for the 7 basic expressions and 541 videos contain annotations for 12 AUs (AU1,AU2,AU4,AU6,AU7,AU10,AU12,AU15,AU23,AU24,AU25,AU26).

How to participate


To participate, you need to register your team.

For this, please send us an email to: D.Kollias@greenwich.ac.uk  with the title "2nd Affective Behavior Analysis in-the-wild Competition: Team Registration".

In this email include the following information:

Team Name

Team Members (include Name and Surname)


Job Title / Position / PhD Student / UG or PG Student

There is no maximum number of participants in each team.

As a reply, you will receive access to the dataset's videos, annotations, cropped and cropped-aligned images and other important information.


At the end of the Challenges, each team will have to send us: i) their predictions on the test set, ii) a link to a Github repository where their solution/source code will be stored, and iii) a link to an ArXiv paper with 2-6 pages describing their proposed methodology, data used and results. After that, the winner of each Challenge will be announced and will be invited to submit a paper describing the solution and results. Also all (non-winning) teams will be able to submit a paper describing their solutions and final results to our Workshop.







• Participants can contribute to any of the 3 Challenges.

• In order to take part in any Challenge, participants will have to register by sending an email to the organizers containing the following information: Team Name, Team Members, Affiliation and Job title.

• Participants can use scene/background/body pose etc. information along with the face information.

• Any face detector whether commercial or academic can be used in the challenge. The paper accompanying the challenge result submission should contain clear details of the detectors/libraries used.

• The participants are free to use external data for training along with the Aff-Wild2 partitions. However, this should be clearly discussed in the accompanying paper

• The participants are free to use any pre-trained network, even the publicly available ones (CNN, AffWildNet) that displayed the best performance in the (former) Aff-Wild database (part of Aff-Wild2). 


Performance Assessment


1)  For Challenge-Track 1: Valence-Arousal estimation :

the Concordance Correlation Coefficient (CCC) will be the metric to judge the performance of the models.


2) For Challenge-Track 2: 7 Basic Expression Classification, the perfromance metric will be: 

0.67* F1_Score + 0.33* Accuracy

Note: F1 Score is the unweighted mean and Accuracy is the total accuracy


3) For Challenge-Track 3: 12 Action Unit Detection, the perfromance metric will be:

0.5* F1_Score + 0.5* Accuracy

Note: F1 Score is the unweighted mean and Accuracy is the total accuracy





40 Teams participated in the VA Challenge; 55 Teams participated in the EXPR Challenge; 51 Teams participated in the AU Challenge. 20, 30 and 26 Teams submitted treir results in the VA, EXPR and AU Challenges, respectively. 10, 13 and 11 Teams scored higher than the baseline and made valid submissions in the VA, EXPR and AU Challenges, respectively; their results are shown in the leaderboard below.

The winner of the VA Challenge is NISL-2021 (as was the case in the first ABAW VA Challenge) consisting of: Didan Deng and Liang Wu (Hong Kong University of Science and Technology)   
The runner-up (with a slight difference from the winning team -49.315 vs 49.045-) is Netease Fuxi Virtual Human consisting of: Wei Zhang, Zunhu Guo, Keyu Chen, Lincheng Li, Zhimeng Zhang and Yu Ding  (Netease Fuxi AI Lab).

The winner of the EXPR Challenge is Netease Fuxi Virtual Human consisting of: Wei Zhang, Zunhu Guo, Keyu Chen, Lincheng Li, Zhimeng Zhang and Yu Ding  (Netease Fuxi AI Lab).
The runner-up is CPIC-DIR2021 consisting of:  Yue Jin, Tianqing Zheng, Chao Gao, Shijie Zhang and Guoqiang Xu (China Pacific Insurance Group Co) 

The winner of the AU Challenge is Netease Fuxi Virtual Human consisting of: Wei Zhang, Zunhu Guo, Keyu Chen, Lincheng Li, Zhimeng Zhang and Yu Ding  (Netease Fuxi AI Lab).
The runner-up (with a small difference from the winning team -69.70 vs 69.04-) i is CPIC-DIR2021 consisting of:  Yue Jin, Tianqing Zheng, Chao Gao, Shijie Zhang and Guoqiang Xu (China Pacific Insurance Group Co) 

The leaderboards for the 3 Challenges can be found below (if you download the pdfs the github/arxiv links are clickable):  






Congratulations to you all, winning and non-winning teams! Thank you very much for participating in our Competition.

All teams are invited to submit their methodologies-papers (please Submission Information section above). All accepted papers will be part of the ICCV 2021 proceedings.

We are looking forward to receiving your submissions! 



Competition Continuation:

We have decided to extend the Competition until the event takes place in October. Therefore if you want to participate and be part of the leaderboard, follow the procedure described in this page.



Test Set Submissions:

Participating teams are allowed to have at most 7 different submissions per Challenge-Track. A submission is considered valid if we receive the code, the paper and the results; so if you fail to submit one of these items, your submission will be invalid.


When sending your final results, make sure to clarify to which Challenge-Track they correspond by for example storing them into a folder named after the Challenge-Track.


The format of the predictions should follow the (same) format of the annotation files that we provided. So if the test set contains for instance 200 videos, the submission should also contain 200 text files (or something more if some videos contain two subjects). The names of the files should match the ones that are in the attached, files.

Each file should contain in its first line the above (as was the case with the annotation files), depending on which Challenge-Track it corresponds to:

  • valence,arousal
  • Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise
  • AU1,AU2,AU4,AU6,AU7,AU10,AU12,AU15,AU23,AU24,AU25,AU26

After that, each line should contain the predictions corresponding to each video frame. 

For the VA Challenge-Track, each following line should have the valence and arousal values (first the valence value and then the arousal) comma separated (as was the case in the annotation files), such as:



For the AU Challenge-Track, each following line should have the 12 action unit values comma separated (as was the case in the annotation files), such as:




For the Expr Challenge-Track, each following line should have the expression value (as was the case in the annotation files), which is in:

{0,1,2,3,4,5,6} (which correspond to the emotions {Neutral,Anger,Disgust,Fear,Happiness,Sadness,Surprise}). 

So for instance one line could be:




Note that in your files you should include predictions for all frames in the video (regardless if the bounding box failed or not). So the total number of lines in a file should be equal to the total number of frames of this video plus one (we previously stated the format of the first line of each file).



Regarding the database:


• The database and annotations are available for academic non-commercial research purposes only. If you want to use them for any other purpose (eg industrial -either research or commercial-) emailD.Kollias@greenwich.ac.uk

• All the training/validation/testing images of the dataset have been obtained from Youtube. We are not responsible for the content nor the meaning of these images.

• Participants will agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data. They will also agree not to further copy, publish or distribute any portion of annotations of the dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

• We reserve the right to terminate participants’ access to the dataset at any time.

• If a participant’s face is displayed in any video and (s)he wants it to be removed, (s)he can email us at any time



Important Dates: 

  • Call for participation announced, team registration begins, data available:       

  3 May, 2021

  • Final submission deadline (Results, Code and ArXiv paper):

10 July, 2021

  • Winners Announcement:      

15 July, 2021

  • Final paper submission deadline:                       

29 July, 2021   (UPDATED)

  • Review decisions sent to authors; Notification of acceptance:                     


       10 August, 2021

  • Camera ready version deadline:                                                                   


       17 August, 2021 





If you use the above data, you must cite all following papers: 


  • D. Kollias, et. al.: "Analysing Affective Behavior in the second ABAW2 Competition". ICCV, 2021

@inproceedings{kollias2021analysing, title={Analysing affective behavior in the second abaw2 competition}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={3652--3660}, year={2021}}


  • D. Kollias, et. al.: "Analysing Affective Behavior in the First ABAW 2020 Competition". IEEE FG, 2020

@inproceedings{kollias2020analysing, title={Analysing Affective Behavior in the First ABAW 2020 Competition}, author={Kollias, D and Schulc, A and Hajiyev, E and Zafeiriou, S}, booktitle={2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG)}, pages={794--800}}


  • D. Kollias, et. al.: "Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study", 2021

@article{kollias2021distribution, title={Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study}, author={Kollias, Dimitrios and Sharmanska, Viktoriia and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:2105.03790}, year={2021} }


  • D. Kollias,S. Zafeiriou: "Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework, 2021

@article{kollias2021affect, title={Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:2103.15792}, year={2021}}


  • D. Kollias, S. Zafeiriou: "Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace". BMVC, 2019

@article{kollias2019expression, title={Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.04855}, year={2019} }

  • D. Kollias, et at.: "Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network", 2019

@article{kollias2019face,title={Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network}, author={Kollias, Dimitrios and Sharmanska, Viktoriia and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.11111}, year={2019}}


  • D. Kollias, et. al.: "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision (IJCV), 2019

@article{kollias2019deep, title={Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond}, author={Kollias, Dimitrios and Tzirakis, Panagiotis and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Schuller, Bj{\"o}rn and Kotsia, Irene and Zafeiriou, Stefanos}, journal={International Journal of Computer Vision}, pages={1--23}, year={2019}, publisher={Springer} }


  • S. Zafeiriou, et. al. "Aff-Wild: Valence and Arousal in-the-wild Challenge". CVPR, 2017

@inproceedings{zafeiriou2017aff, title={Aff-wild: Valence and arousal ‘in-the-wild’challenge}, author={Zafeiriou, Stefanos and Kollias, Dimitrios and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Kotsia, Irene}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1980--1987}, year={2017}, organization={IEEE} }



Keynote Speakers


Viktoriia Sharmanska

Viktoriia Sharmanska is currently a Lecturer in AI at the Department of Informatics, University of Sussex, and an honorary lecturer at Imperial College London, UK. During 2017-2020, she was a recipient of a prestigious Imperial College Research Fellowship at the Department of Informatics, working on deep learning methods for human behavior analysis.
Dr Sharmanska has co-authored numerous papers published at CVPR, ICCV/ECCV, NeurIPS, on novel statistical machine learning methodologies applied to computer vision problems, such as attribute-based object recognition, learning using privileged information, cross-modal learning, and recently on human facial behavior analysis and algorithmic fairness methods.
She has built an international reputation such as being among the youngest Area Chair for top-tier international conferences in computer vision and deep learning such as ICLR since 2019, and CVPR 2021. Dr Sharmanska has received a number of prestigious awards, such as the Imperial College Research Fellowship 2017, Outstanding Reviewer Award at CVPR 2019.
Her current research interests include deep learning methods for human behavior understanding from facial and bodily cues, video data synthesis, and algorithmic fairness methods to mitigate machine bias in visual data.



The Affective Behavior Analysis in-the-wild Challenge has been generously supported by:


  • MorphCast (https://www.morphcast.com)

MorphCast is an innovative company which is developing an Emotional Interactive Video Platform, the first solution to create and watch Interactive Videos, where contents can adapt, in real time, based on the demographic and facial expressions of the viewers.

MorphCast VP uses an Artificial Intelligence Javascript SDK for the detection of 130+ expressions and features through facial analysis (including emotions, affects, arousal, valence, attention level, and much more), combined with MorphCast HTML5 Video Player and MorphCast Studio, a professional tool to create Interactive Videos.

Many industries can benefit significantly from the use of this high-tech service, such as digital ADV, digital learning, RTC, HR, and many others.


  • Headroom (https://www.goheadroom.com)

People work better when they are free to focus on what they enjoy most — forming relationships, thinking creatively, and solving problems. We let A.I. take care of the rest.

Currently, working together over video can be frustrating and hard to navigate with a group, note taking can be a huge distraction from really engaging in the conversation, not everyone loves speaking up and grabbing everyone's attention can be hard.

Current video conferencing solutions may send pixels, but deliver fatigue, frustration and failed communication. Formed in San Francisco, CA by veterans of Google and Magic Leap Machine Learning products, Headroom is looking to change that.


  • Facesoft 

Facesoft has developed facial recognition technology which can be used across sectors like security, healthcare, and entertainment. The core of the technology lies in machine learning algorithms for 3D face reconstruction and facial recognition. The company has trained its face reconstruction algorithm parameters using a proprietary database, consisting of 2.5 million high-resolution 3D scans of real faces. The trained reconstruction model allows the platform to create billions of realistic computer-generated faces which surpasses any existing database of real faces.