|
April 3, 2022 |
|
|
April 14, 2022 |
|
|
|
|
Datasets
Code
The ABAW Workshop and Competition is a continuation of the respective Workshops and Competitions held at ICCV 2021, IEEE FG 2020 (a), IEEE FG 2020 (b) and IEEE CVPR 2017 Conferences.
The ABAW Workshop and Competition has a unique aspect of fostering cross-pollination of different disciplines, bringing together experts (from academia, industry, and government) and researchers of mobile and ubiquitous computing, computer vision and pattern recognition, artificial intelligence and machine learning, multimedia, robotics, HCI, ambient intelligence and psychology. The diversity of human behavior, the richness of multi-modal data that arises from its analysis, and the multitude of applications that demand rapid progress in this area ensure that our events provide a timely and relevant discussion and dissemination platform.
The 3rd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), will be held in conjunction with the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
The event will take place on 19 June, from 8am until 1pm (CST time zone) and it will be a hybrid event (with both in-person and online attendance).
The workshop's agenda can be found here. All displayed times are on US Central Standard Time (CST) Zone.
Dimitrios Kollias, Queen Mary University of London, UK d.kollias@qmul.ac.uk
Stefanos Zafeiriou, Imperial College London, UK s.zafeiriou@imperial.ac.uk
Viktoriia Sharmanska, University of Sussex, UK sharmanska.v@sussex.ac.uk
Elnar Hajiyev, Realeyes - Emotional Intelligence elnar@realeyesit.com
Björn W. Schuller is Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin/Germany, independent research leader within the Alan Turing Institute and Royal Statistical Society Lab’s Data, Analytics and Surveillance Group, as part of the UK Health Security Agency, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Previous stays include Guest Professor at Southeast University in Nanjing/China, Full Professor at the University of Passau/Germany, Key Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the BCS, Fellow of the ISCA, Fellow and President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 1,200+ publications (45k+ citations, h-index=97), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His 30+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 15+ European Projects, is an ERC Starting and DFG Reinhart-Koselleck Grantee, and consultant of companies such as Barclays, GN, Huawei, Informetis, or Samsung.
Stavros Petridis is a scientific research manager at Meta AI and an honorary research fellow at the intelligent behaviour understanding group (iBUG) at Imperial College London. He studied electrical and computer engineering at the Aristotle University of Thessaloniki, Greece and completed the MSc degree in Advanced Computing at Imperial College London. He also did his PhD in Computer Science at the same university. Stavros has been a visiting researcher at the image processing group at University College London, at the Robotics Institute, Carnegie Mellon University and at the affect analysis group at the University of Pittsburgh. His research interests are in the area of audio-visual recognition and generation of human behaviour.
This Workshop tackles the problem of affective behavior analysis in-the-wild, that is a major targeted characteristic of HCI systems used in real life applications. The target is to create machines and robots that are capable of understanding people's feelings, emotions and behaviors; thus, being able to interact in a 'human-centered' and engaging manner with them, and effectively serving them as their digital assistants. This interaction should not be dependent on the respective context, nor the human's age, sex, ethnicity, educational level, profession, or social position. As a result, the development of intelligent systems able to analyze human behaviors in-the-wild can contribute to generation of trust, understanding and closeness between humans and machines in real life environments.
Representing human emotions has been a basic topic of research. The most frequently used emotion representation is the categorical one, including the seven basic categories, i.e., Anger, Disgust, Fear, Happiness, Sadness, Surprise and Neutral. Discrete emotion representation can also be described in terms of the Facial Action Coding System model, in which all possible facial actions are described in terms of Action Units. Finally, the dimensional model of affect has been proposed as a means to distinguish between subtly different displays of affect and encode small changes in the intensity of each emotion on a continuous scale. The 2-D Valence and Arousal Space (VA-Space) is the most usual dimensional emotion representation; valence shows how positive or negative an emotional state is, whilst arousal shows how passive or active it is.
To this end, the developed systems should automatically sense and interpret facial and audio-visual signals relevant to emotions, traits, appraisals and intentions. Furthermore, since real-world settings entail uncontrolled conditions, where subjects operate in a diversity of contexts and environments, systems that perform automatic analysis of human behavior and emotion recognition should be robust to video recording conditions, diversity of contexts and timing of display.
Recently a lot of attention has been brought towards understanding and mitigating algorithmic bias in models. In the context of in-the-wild generalisation, the subgroup distribution shift is a challenging task. In this scenario, a difference in performance is observed across subgroups (e.g. demographic sub-populations of the training data), which can degrade the performance of the model deployed in-the-wild. The aim is to build fair machine learning models that perform well on all subgroups and improve in-the-wild generalisation.
All these goals are scientifically and technically challenging.
This Workshop will solicit contributions on the recent progress of recognition, analysis, generation and modelling of face, body, and gesture, while embracing the most advanced systems available for face and gesture analysis, particularly, in-the-wild (i.e., in unconstrained environments) and across modalities like face to voice. In parallel, this Workshop will solicit contributions towards building fair models that perform well on all subgroups and improve in-the-wild generalisation.
Original high-quality contributions, including:
- databases or
- surveys and comparative studies or
- Artificial Intelligence / Machine Learning / Deep Learning / AutoML / (Data-driven or physics-based) Generative
Modelling Methodologies (either Uni-Modal or Multi-Modal; Uni-Task or Multi-Task ones)
are solicited on the following topics:
i) "in-the-wild" facial expression or micro-expression analysis,
ii) "in-the-wild" facial action unit detection,
iii) "in-the-wild" valence-arousal estimation,
iv) "in-the-wild" physiological-based (e.g.,EEG, EDA) affect analysis,
v) domain adaptation for affect recognition in the previous 4 cases
vi) "in-the-wild" face recognition, detection or tracking,
vii) "in-the-wild" body recognition, detection or tracking,
viii) "in-the-wild" gesture recognition or detection,
ix) "in-the-wild" pose estimation or tracking,
x) "in-the-wild" activity recognition or tracking,
xi) "in-the-wild" lip reading and voice understanding,
xii) "in-the-wild" face and body characterization (e.g., behavioral understanding),
xiii) "in-the-wild" characteristic analysis (e.g., gait, age, gender, ethnicity recognition),
xiv) "in-the-wild" group understanding via social cues (e.g., kinship, non-blood relationships, personality)
xv) subgroup distribution shift analysis in affect recognition
xvi) subgroup distribution shift analysis in face and body behaviour
xvii) subgroup distribution shift analysis in characteristic analysis
Accepted papers will appear at CVPR 2022 proceedings.
|
April 3, 2022 |
|
|
April 14, 2022 |
|
|
|
|
The paper format should adhere to the paper submission guidelines for main CVPR 2022 proceedings style. Please have a look at the Submission Guidelines Section here.
All papers should be submitted using CMT website https://cmt3.research.microsoft.com/ABAW2022.
All accepted manuscripts will be part of CVPR 2022 conference proceedings.
The Competition is a continuation of the ABAW Competition held last year in ICCV and the year before in IEEE FG.
It is split into the below mentioned Challenges. These Challenges will produce a significant step forward when compared to previous events.
Participants are invited to participate in at least one of these Challenges.
The leaderboards for the 4 Challenges can be found below:
CVPR2022_ABAW3_VA_Leaderboard
CVPR2022_ABAW3_EXPR_Leaderboard
CVPR2022_ABAW3_AU_Leaderboard
CVPR2022_ABAW3_MTL_Leaderboard
Congratulations to all teams, winning and non-winning ones! Thank you very much for participating in our Competition.
All teams are invited to submit their methodologies-papers (please see Submission Information section above). All accepted papers will be part of the IEEE CVPR 2022 proceedings.
We are looking forward to receiving your submissions!
In order to participate, teams will have to register; the lead researcher should send an email from their official address (no personal emails will be accepted) to d.kollias@qmul.ac.uk with:
i) subject "3rd ABAW Competition: Team Registration";
ii) this EULA (if the team is composed of only academics) or this EULA (if the team has at least one member coming from the industry) filled in, signed and attached;
iii) the lead researcher's official academic/industrial website; the lead researcher cannot be a student (UG/PG/Ph.D.);
iv) the emails of each team member
v) the team's name
There is a maximum number of 8 participants in each team.
As a reply, you will receive access to the dataset's videos, annotations, cropped and cropped-aligned images and other important information.
At the end of the Challenges, each team will have to send us:
i) their predictions on the test set,
ii) a link to a Github repository where their solution/source code will be stored, and
iii) a link to an ArXiv paper with 2-6 pages describing their proposed methodology, data used and results.
After that, the winner of each Challenge, along with a leaderboard, will be announced.
There will be one winner per Challenge. The top-3 performing teams of each Challenge will have to contribute paper(s) describing their approach, methodology and results to our Workshop; the accepted papers will be part of the CVPR 2022 proceedings. All other teams are also able to submit paper(s) describing their solutions and final results; the accepted papers will be part of the CVPR 2022 proceedings.
For the purpose of the Challenges and to facilitate training, especially for people that do not have access to face detectors/tracking algorithms, we provide the cropped images and the cropped & aligned ones.
The Competition's white paper (describing the Competition, the data, the baselines and results) is ready and can be found here.
For this Challenge, the Aff-Wild2 database will be used. Aff-Wild2 is an extension of the Aff-Wild database.
In total, 564 videos of around 2.8M frames will be used that contain annotations in terms of valence and arousal.
Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (valence-arousal estimation task) solutions.
Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (eg VA estimation, Expression Classification, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology you should not use any other annotations (expressions or AUs): the methodology should be purely uni-task, using only the VA annotations (this means that you can use -if you want- other databases' VA annotations as data augmentation or extraction of knowledge etc).
The performance measure will be the mean Concordance Correlation Coefficient (CCC) of valence and arousal:
P = 0.5 * (CCC_arousal + CCC_valence)
The baseline network is a pre-trained on ImageNet ResNet-50 and its performance on the validation set is:
CCC_valence = 0.31
CCC_arousal = 0.17
P = 0.5 * (CCC_arousal + CCC_valence) = 0.24
For this Challenge, the Aff-Wild2 database will be used.
In total, 548 videos of around 2.7M frames will be used that contain annotations in terms of the 6 basic expressions, plus the neutral state, plus a category 'other' that denotes expressions/affective states other than the 6 basic ones.
Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (expression classification task) solutions.
Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (eg VA estimation, Expression Classification, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology you should not use any other annotations (VA or AUs): the methodology should be purely uni-task, using only the Expr annotations (this means that you can use -if you want- other databases' Expr annotations as data augmentation or extraction of knowledge etc).
The performance measure will be the average F1 Score across all 8 categories:
P = ∑ (F1) / 8
The baseline network is a pre-trained VGGFACE (with fixed convolutional weights) and its performance on the validation set is:
P = 0.23
For this Challenge, the Aff-Wild2 database will be used.
In total, 547 videos of around 2.7M frames will be used that contain annotations in terms of 12 action units, namely AU1,AU2,AU4,AU6,AU7,AU10,AU12,AU15,AU23,AU24,AU25,AU26
Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (action unit detection task) solutions.
Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (eg VA estimation, Expression Classification, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology you should not use any other annotations (VA or Expr): the methodology should be purely uni-task, using only the AU annotations (this means that you can use -if you want- other databases' AU annotations as data augmentation or extraction of knowledge etc).
The performance measure will be the average F1 Score across all 12 categories:
P = ∑ (F1) / 12
The baseline network is a pre-trained VGGFACE (with fixed convolutional weights) and its performance on the validation set is:
P = 0.39
For this Challenge, s-Aff-Wild2 database will be used. s-Aff-Wild2 is a static version of Aff-Wild2 database; it contains selected-specific frames-images from Aff-Wild2.
In total, around 175K images will be used that contain annotations in terms of valence-arousal; 6 basic expressions, plus the neutral state, plus the 'other' category; 12 action units.
There are no rules; any solution will be accepted for this Challenge.
The performance measure will be the sum of: the mean Concordance Correlation Coefficient (CCC) of valence and arousal; the average F1 Score across all 8 expression categories; the average F1 Score across all 12 action units:
P = 0.5 * (CCC_arousal + CCC_valence) + 0.125 * ∑ (F1_expr) + ∑ (F1_au) / 12
The baseline network is a pre-trained VGGFACE (with fixed convolutional weights) and its performance on the validation set is:
P =
• Participants can contribute to any of the 4 Challenges.
• In order to take part in any Challenge, participants will have to register as described above
• Participants can use scene/background/body pose etc. information along with the face information.
• Any face detector whether commercial or academic can be used in the challenge. The paper accompanying the challenge result submission should contain clear details of the detectors/libraries used.
• The participants are free to use external data for training along with the Aff-Wild2 partitions. However, this should be clearly discussed in the accompanying paper
• The participants are free to use any pre-trained network, as long as this is not using Aff-Wild2's annotations.
|
January 20, 2022 |
|
March 19, 2022 |
||
|
March 25, 2022 |
||
|
March 28, 2022 |
||
|
April 3, 2022 |
|
April 14, 2022 |
|
April 20, 2022 |
• All the training/validation/testing images of the dataset have been obtained from Youtube. We are not responsible for the content nor the meaning of these images.
• Participants will agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data. They will also agree not to further copy, publish or distribute any portion of annotations of the dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.
• We reserve the right to terminate participants’ access to the dataset at any time.
• If a participant’s face is displayed in any video and (s)he wants it to be removed, (s)he can email us at any time
If you use the above data, you must cite all following papers:
@inproceedings{kollias2022abaw, title={Abaw: Valence-arousal estimation, expression recognition, action unit detection \& multi-task learning challenges}, author={Kollias, Dimitrios}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={2328--2336}, year={2022} }
@inproceedings{kollias2021analysing, title={Analysing affective behavior in the second abaw2 competition}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={3652--3660}, year={2021}}
@inproceedings{kollias2020analysing, title={Analysing Affective Behavior in the First ABAW 2020 Competition}, author={Kollias, D and Schulc, A and Hajiyev, E and Zafeiriou, S}, booktitle={2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG)}, pages={794--800}}
@article{kollias2021distribution, title={Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study}, author={Kollias, Dimitrios and Sharmanska, Viktoriia and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:2105.03790}, year={2021} }
@article{kollias2021affect, title={Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:2103.15792}, year={2021}}
@article{kollias2019expression, title={Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace}, author={Kollias, Dimitrios and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.04855}, year={2019} }
@article{kollias2019face,title={Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network}, author={Kollias, Dimitrios and Sharmanska, Viktoriia and Zafeiriou, Stefanos}, journal={arXiv preprint arXiv:1910.11111}, year={2019}}
@article{kollias2019deep, title={Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond}, author={Kollias, Dimitrios and Tzirakis, Panagiotis and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Schuller, Bj{\"o}rn and Kotsia, Irene and Zafeiriou, Stefanos}, journal={International Journal of Computer Vision}, pages={1--23}, year={2019}, publisher={Springer} }
@inproceedings{zafeiriou2017aff, title={Aff-wild: Valence and arousal ‘in-the-wild’challenge}, author={Zafeiriou, Stefanos and Kollias, Dimitrios and Nicolaou, Mihalis A and Papaioannou, Athanasios and Zhao, Guoying and Kotsia, Irene}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on}, pages={1980--1987}, year={2017}, organization={IEEE} }
The Affective Behavior Analysis in-the-wild Challenge has been generously supported by: