%matplotlib inline
import numpy as np

# Understanding Active Appearance Models

i·BUG group tutorial

Joan Alabort-i-Medina

ja310@imperial.ac.uk

## Part 1: Building Active Appearance Models

"Active Apperance Models (AAMs) are non-linear, generative, and parametric models of a certain visual phenomenon".

I. Matthews and S. Baker, 2004.

generative: they generate images of a particular object class (e.g. the human face).
parametric: they are controlled by a set of parameters.
non-linear: they are non-linear in terms of pixel intensities.

from menpofit.visualize import visualize_aam
from alabortcvpr2015.utils import pickle_load

aam = pickle_load('/Users/joan/PhD/Models/aam_int.menpo')

visualize_aam(aam)

A bit of history...

They were originally proposed in 1998 by G. Edwards, C. J. Taylor, and T. F. Cootes from Department of Medical Biophysics (not Computing!) at University of Manchester.

G. Edwards, C. J. Taylor, and T. F. Cootes, "Interpreting face images using active appearance models". FG 1998.

Luckily for the original authors quite a lot of research stemmed from the original paper:

T. F. Cootes, K. Walker, and C. J. Taylor, “View-based active appearance models,” in FG, 2000
T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” TPAMI, 2001.
T. F. Cootes and C. J. Taylor, “On representing edge structure for model matching,” in CVPR, 2001
I. Matthews and S. Baker, “Active appearance models revisited,” IJCV, 2004.
R. Gross, I. Matthews, and S. Baker, “Generic vs. person specific active appearance models,” IVC, 2005.
A. U. Batur and M. H. Hayes, “Adaptive active appearance models,” TIP, 2005.
G. Papandreou and P. Maragos, “Adaptive and constrained algorithms for inverse compositional active appearance model fitting,” CVPR, 2008.
J. Saragih and R. Gocke, “Learning aam fitting through simulation,” PR, 2009.
B. Amberg, A. Blake, and T. Vetter, “On compositional image alignment, with an application to active appearance models,” CVPR, 2009.
G. Tzimiropoulos, J. Alabort-i-Medina, S. Zafeiriou, and M. Pantic. "Generic active appearance models revisited". ACCV, 2012.
G. Tzimiropoulos and M. Pantic. "Optimization problems for fast aam fitting in-the-wild". ICCV, 2013.
G. Tzimiropoulos and M. Pantic. "Gauss-Newton deformable part models for face alignment in-the-wild". CVPR, 2014.
J. Alabort-i-Medina and S. Zafeiriou. "Bayesian active appearance models". CVPR, 2014.
E. Antonakos, J. Alabort-i-Medina, G. Tzimiropoulos, and S. Zafeiriou. "HOG active appearance models". ICIP 2014.
J. Kossaifi, G. Tzimiropoulos, M. Pantic. "Fast newton active appearance models". ICIP, 2014.

J. Alabort-i-Medina, E. Antonakos, J. Booth, P. Snape, S. Zafeiriou. "Menpo: A Comprehensive Platform for Parametric Image Alignment and Visual Deformable Models". ACM Multimedia, Open Source Software Competition, 2014.

although, (typically) linear in terms of both shape and texture, the image formation process is non-linear in terms of pixel intensities

To get us started with AAMs, we will assume that the object we are interested in modelling is no other than the human face:

Alt text

The first thing we will need is a large collection of face images:

import menpo.io as mio
from menpo.landmark import labeller, ibug_face_66

images = []
for i  in mio.import_images('/Users/joan/PhD/DataBases/faces/lfpw/trainset/', 
                            max_images=None, verbose=True):
    
    i.crop_to_landmarks_proportion_inplace(0.5)
    i = i.rescale_landmarks_to_diagonal_range(100)
    labeller(i, 'PTS', ibug_face_66)
    if i.n_channels == 3:
        i = i.as_greyscale(mode='luminosity')
    
    images.append(i)

- Loading 811 assets: [====================] 100%

from menpo.visualize import visualize_images

visualize_images(images)

Wait a moment... What are these points!

Fair enough, I kind of lied before...

What we really need is a large collection of carefully * annotated* face images.

The previous annotations try to encode the notion face shape...

A shape is the form of an object or its external boundary, outline, or external surface, as opposed to other properties such as colour, texture or material composition.

Wikipedia

...by consistently identifying the positions of a small set of landmarks defining the faces in all images.

In morphometrics, landmark point or shortly landmark is a point in a shape object in which correspondences between and within the populations of the object are preserved.

Wikipedia

Mathematically, a shape can be defined as:

s = (x 1, y 1, \dots, x v, y v) T \in R 2 v \times 1

$\mathbf{s} = (x_1, y_1, \cdots, x_v, y_v)^T \in \mathcal{R}^{2v \times 1}$

from menpo.visualize import visualize_shapes

visualize_shapes([i.landmarks for i in images])

In AAMs, images of a particular object are generated by combining linear models describing the shape and texture of the object using a specific motion model (also referred to as the warp).

Let us start by formally defining the shape model:

s = s ¯ + \sum i = 1 n s p i s i = s ¯ + S p

$\mathbf{s} = \bar{\mathbf{s}} + \sum_{i=1}^{n_s} p_i \mathbf{s}_i = \bar{\mathbf{s}} + \mathbf{S} \mathbf{p}$

where:

s ¯ = (x 1 ¯, y 1 ¯, \dots, x v ¯, y v ¯) T \in R 2 v

$\bar{\mathbf{s}} = (\bar{x_1}, \bar{y_1}, \cdots, \bar{x_v}, \bar{y_v})^T\in \mathcal{R}^{2v}$

p = (p 1, \dots, p n s) T \in R n s

$\mathbf{p} = (p_1, \cdots, p_{n_s})^T \in \mathcal{R}^{n_s}$

S = (s 1, \dots, s n s) \in R 2 v \times n s

$\mathbf{S} = (\mathbf{s}_1, \cdots, \mathbf{s}_{n_s} )\in \mathcal{R}^{2v \, \times \, n_s}$

The previous shape model can be learned by applying Principal Component Analysis (PCA) to the set of manually annotated points defining the object shape in the images (usually after registering all shapes using Procrustes Analysis).

from menpo.transform import Translation, GeneralizedProcrustesAnalysis
from menpo.model import PCAModel

# extract shapes from images
shapes = [i.landmarks['ibug_face_66'].lms for i in images] 

# centralize shapes
centered_shapes = [Translation(-s.centre()).apply(s) for s in shapes]
# align centralized shape using Procrustes Analysis
gpa = GeneralizedProcrustesAnalysis(centered_shapes)
aligned_shapes = [s.aligned_source() for s in gpa.transforms]

# build shape model
shape_model = PCAModel(aligned_shapes)

from menpofit.visualize import visualize_shape_model

visualize_shape_model(shape_model)

Note that because the shapes were normalized using Procrustes Analysis before we applied PCA, the previous shape model has mainly learned nonrigid facial deformations and lacks the ability of placing shapes at arbitrary positions on the image plane.

Luckly, this problem can be solved by composing the model with a 2d similarity transform:

x = s R (x ¯ + \sum i = 1 n s p i x i) + t

$\mathbf{x} = s \mathbf{R} \left( \bar{\mathbf{x}} + \sum_{i=1}^{n_s} p_i \mathbf{x}_i \right) +\mathbf{t}$

where:

x = (x, y) T \in R 2

$\mathbf{x} = (x, y)^T \in \mathcal{R}^{2}$

s \in R

$s \in \mathcal{R}$

R \in R 2 \times 2

$\mathbf{R} \in \mathcal{R}^{2 \, \times \, 2}$

t = (t x, t y) T \in R 2

$\mathbf{t} = (t_x, t_y)^T \in \mathcal{R}^{2}$

Luckily after some clever reparameterization (Matthews and Baker, 2004) the shape model can still be concisely expressed as before:

s = s ¯ + \sum i = 1 4 p * i s * i + \sum i = 1 n s p i s i = s ¯ + S * p * + S p = s ¯ + S ~ p ~

$\mathbf{s} = \bar{\mathbf{s}} + \sum_{i=1}^{4} p^*_i \mathbf{s}^*_i + \sum_{i=1}^{n_s} p_i \mathbf{s}_i = \bar{\mathbf{s}} + \mathbf{S}^* \mathbf{p}^* + \mathbf{S} \mathbf{p} = \bar{\mathbf{s}} + \tilde{\mathbf{S}} \tilde{\mathbf{p}}$

where:

p * = (p * 1, \dots, p * 4) T \in R 4

$\mathbf{p}^* = (p^*_1, \cdots, p^*_4)^T \in \mathbf{R}^{4}$

S * = (s * 1, \dots, s * 4) T \in R 2 v \times 4

$\mathbf{S}^* = (s^*_1, \cdots, s^*_4)^T \in \mathbf{R}^{2v \, \times \, 4}$

s * 1 = s ¯ \in R 2 v

$\mathbf{s}^*_1 = \bar{\mathbf{s}} \in \mathbf{R}^{2v}$

s * 2 = (- y 1, x 1, \dots, - y v, x v) T \in R 2 v

$\mathbf{s}^*_2 = (-y_1, x_1, \cdots, -y_v, x_v)^T \in \mathbf{R}^{2v}$

s * 3 = (1, 0, \dots, 1, 0) T \in R 2 v

$\mathbf{s}^*_3 = (1, 0, \cdots, 1, 0)^T \in \mathbf{R}^{2v}$

s * 4 = (0, 1, \dots, 0, 1) T \in R 2 v

$\mathbf{s}^*_4 = (0, 1, \cdots, 0, 1)^T \in \mathbf{R}^{2v}$

import numpy as np
from menpo.transform import AlignmentSimilarity
from menpo.model import MeanInstanceLinearModel
from menpofit.modelinstance import OrthoPDM

# get shape model mean as numpy array
shape_vector = shape_model.mean().as_vector()

# initialize S star
S_star = np.zeros((4, shape_vector.shape[0]))
# first column is the mean
S_star[0, :] = shape_vector  # Comp. 1 - just the mean
# second column is the rotated mean
rotated_ccw = shape_model.mean().points[:, ::-1].copy()  # flip x,y -> y,x
rotated_ccw[:, 0] = -rotated_ccw[:, 0]  # negate (old) y
S_star[1, :] = rotated_ccw.flatten()  # C2 - the mean rotated 90 degs
# third column
S_star[2, ::2] = 1  # Tx
# fourth column
S_star[3, 1::2] = 1  # Ty

# build 2d similarity model
sim_2d_model = MeanInstanceLinearModel(S_star, shape_vector, shape_model.mean())

# orthogonalize and compose 2d similarity model with original shape model
augmented_sm = shape_model.copy()
augmented_sm.orthonormalize_against_inplace(sim_2d_model)

visualize_shape_model(augmented_sm)

from menpofit.transform import DifferentiableAlignmentSimilarity
from menpofit.modelinstance import OrthoPDM

augmented_sm = OrthoPDM(shape_model, AlignmentSimilarity)

visualize_shape_model(augmented_sm.model)

So far, so good ;-)

Apart from that las bit...

Let us now switch our attention to the appearance model.

We will shortly see how the appearance model is also learned using PCA. However, in order to be able to apply PCA we first need to introduce the motion model and the concept of shape-free textures.

PCA can only be applied in a particular vector space or, in other words, all vectors to which we want to apply PCA to must have the same lenght.

This is clearly not the case for the face images we just loaded:

print 'image 0 is:', images[0]
print 'image 1 is:', images[1]

image 0 is: 142W x 135H 2D Image with 1 channel
image 1 is: 141W x 140H 2D Image with 1 channel

We could resize all images to a particular resolution but that is very likely to arbitrarily modify their original aspect ratio and include a lot of backgorund information (even if images were tighly cropped).

Instead, the idea is to make use of the annotated landmarks (which are a requirement) to define the face appearance region in each image and map it to the same vector space.

This can be done by:

Defining a reference image frame (typically build using the shape model mean shape).

Warping all images to the previous reference frame by using a non-linear warping function.

The non-linear warping function is referred to as the motion model and typical choices for this function include Piece Wise Affine and Thin Plate Splines warps.

Once all images have been warped onto the reference frame they all have the same dimensionality (i.e. they all have the same number of pixels) and we are ready to apply PCA.

Note that, after they have being warped, all images also share the same face shape and hence the name shape-free textures.

A shape free texture can be mathematically defined using the following expression:

v e c (I (W (x; p))) = a

$\mathcal{vec}\left(I(\mathcal{W}(\mathbf{x}; \mathbf{p}))\right) = \mathbf{a}$

from menpo.transform import PiecewiseAffine
from menpofit.aam.builder import build_reference_frame   

# build reference frame
reference_frame = build_reference_frame(shape_model.mean())
reference_shape = reference_frame.landmarks['source'].lms

# build PiecewiseAffine transforms
transforms = [PiecewiseAffine(reference_shape, s) for s in shapes]

# warp images
warped_images = []
for (i, t) in zip(images, transforms):
    wi = i.warp_to_mask(reference_frame.mask, t) 
    wi.landmarks = reference_frame.landmarks
    warped_images.append(wi)

visualize_images(warped_images)

After defining the motion model and introducing the concept of shape-free textures, we are now ready to formally define the appearance model:

a = a ¯ + \sum i = 1 n a c i a i = a ¯ + A c

$\mathbf{a} = \bar{\mathbf{a}} + \sum_{i=1}^{n_a} c_i \mathbf{a}_i = \bar{\mathbf{a}} + \mathbf{A} \mathbf{c}$

where:

a ¯ = (a 1 ¯, \dots, a d ¯) T \in R d

$\bar{\mathbf{a}} = (\bar{a_1}, \cdots, \bar{a_d})^T\in \mathcal{R}^{d}$

c = (c 1, \dots, c n t) T \in R n t

$\mathbf{c} = (c_1, \cdots, c_{n_t})^T \in \mathcal{R}^{n_t}$

A = (a 1, \dots, a n s) \in R d \times n t

$\mathbf{A} = (\mathbf{a}_1, \cdots, \mathbf{a}_{n_s} )\in \mathcal{R}^{d \, \times \, n_t}$

appearance_model = PCAModel(warped_images)

from menpofit.visualize import visualize_appearance_model
 
visualize_appearance_model(appearance_model)

Well done!!! We have now almost covered the basics concepts defining Active Appearance Models.

Only one bit is missing, and that is how to combine the previous three models (shape, appearance and motion) so that we can effectively generate novel face images using AAMs.

And the answer is:

Generate a novel shape instance:

s = s ¯ + S p

$\mathbf{s} = \bar{\mathbf{s}} + \mathbf{S} \mathbf{p}$

Generate a novel appearance instance and rearrange it back onto a shape-free texture form:

a = a ¯ + A c

$\mathbf{a} = \bar{\mathbf{a}} + \mathbf{A} \mathbf{c}$

A (x) = m a t r i x (a)

$A(x) = \mathcal{matrix} \left( \mathbf{a} \right)$

Warp the shape-free appearance intance onto the shape instance using the motion model:

A (W (x, p)) = I (x)

$A(\mathcal{W}(\mathbf{x}, p)) = I(\mathbf{x})$

# choose shape parameters at random
p = (np.random.randn(shape_model.n_components) * 
     np.sqrt(shape_model.eigenvalues)) 
# generate shape instance
s = shape_model.instance(p)

# define image frame containing shape instance
I = build_reference_frame(s)
landmarks = I.landmarks['source'].lms

I.view_landmarks(group='source', render_numbering=False)

<menpo.visualize.viewmatplotlib.MatplotlibLandmarkViewer2d at 0x11d374c50>

 # choose appearance parameters at random
c = (np.random.randn(appearance_model.n_components) * 
     np.sqrt(appearance_model.eigenvalues))
# generate shape instance
A = appearance_model.instance(c)

A.view()

<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x11e405ed0>

# compute PiecewiseAffine transform
transform = PiecewiseAffine(landmarks, A.landmarks['source'].lms)

I = A.warp_to_mask(I.mask, transform, warp_landmarks=True)

I.view_landmarks(group='source', render_numbering=False)

<menpo.visualize.viewmatplotlib.MatplotlibLandmarkViewer2d at 0x11b4d55d0>

Things to take with you from this first part:

AAMs are non-linear, generative, and parametric models of visual phenomena.
They consists of three different sub-models:
Shape model
Appearance model
Motion model
Shape and appearance models are linear models learned from annotated training data using PCA.
The motion model non-linearly relates the shape and appearance models and is itself an essential part of the AAM formulation.

Questions?

## Part 2: Fitting Active Appearance Models

AAMs were originally developed for solving non-rigid object alignment problems. And up until now they remain quite popular in the domains of face alignment and medical image registration.

As we did in part one, we will find it useful to restrict the problem to the domain faces, i.e. we will use AAMs to specifically tackle the face alignment problem.

Let us start by defining the problem:

Fitting an Active Appearance Model consists of finding the optimal parameters for which its shape and appearance models accurately describe the object being modelled in a particular image.

This definition is mine! :-)

Note that the only available information at fitting time is:

The input image which we want to fit.

# load image
img = mio.import_image('/Users/joan/PhD/DataBases/faces/lfpw/testset/image_0001.png')

# pre-processing
img.crop_to_landmarks_proportion_inplace(0.5)
img = img.rescale_landmarks_to_diagonal_range(100)
labeller(img, 'PTS', ibug_face_66)
if img.n_channels == 3:
    img = img.as_greyscale(mode='luminosity')

img.view()

<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x10c480510>

An initial guess for the face shape.

from menpofit.base import noisy_align

# noisy aligned the shape model's mean with the ground truth
transform = noisy_align(shape_model.mean(), 
                        img.landmarks['ibug_face_66'].lms)
initial_shape = transform.apply(shape_model.mean())

# add the initial shape as landmarks to the image
img.landmarks['initial_guess'] = initial_shape

img.view_landmarks(group='initial_guess', render_numbering=False);

The AAM that we will use to fit the image.

from alabortijcv2015.utils import pickle_load
from menpofit.visualize import visualize_aam

aam = pickle_load('/Users/joan/PhD/Models/aam_int.menpo')

visualize_aam(aam)

The problem of fitting AAMs to input images can be formally defined as:

p o, c o = a r g m i n p, c 1 2 \sum x \in Ω ∥ ∥ ∥ I (W (x; p)) - A ¯ (x) - \sum i = 1 n s c i A i (x) ∥ ∥ ∥ 2

$\mathbf{p}_o, \mathbf{c}_o = \underset{\mathbf{p}, \mathbf{c}}{\mathrm{arg\,min\;}} \, \frac{1}{2}\sum_{\mathbf{x} \in \Omega} \left\| I(\mathcal{W}(\mathbf{x}; \mathbf{p})) - \bar{A}(\mathbf{x}) - \sum_{i=1}^{n_s} \mathbf{c}_i A_i(\mathbf{x}) \right\|^2$

which can be concisely rewritten in vector form as:

p o, c o = a r g m i n p, c f (p, c) = a r g m i n p, c 1 2 ∥ i [p] - a ¯ - A c ∥ 2

$\mathbf{p}_o, \mathbf{c}_o = \underset{\mathbf{p}, \mathbf{c}}{\mathrm{arg\,min\;}} \, \mathcal{f}(\mathbf{p}, \mathbf{c}) = \underset{\mathbf{p}, \mathbf{c}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2$

Therefore, fitting an AAM to novel image consists of finding the optimal shape and appearance parameters for which the previous expression is minimized.

Note that:

The shape parameters control the way in which the input image is warped.

The reconstruction error of the warped input image by the appearance model is the quantity to be minimized and, consequently, is what drives the optimization.

Ideally, the reconstruction error should be minimized when the shape parameters correctly place the face landmarks onto the input image.

When that happens, the input image is correctly warped onto the reference frame and its appearance model reconsturction should be optimal (hence, the error minimized).

Let us define the residual:

r (p, c) = i [p] - a ¯ - A c

$\mathbf{r}(\mathbf{p}, \mathbf{c}) = \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c}$

note that the problem can now also be written as:

p o, c o = a r g m i n p, c 1 2 r (p, c) T r (p, c)

$\mathbf{p}_o, \mathbf{c}_o = \underset{\mathbf{p}, \mathbf{c}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \mathbf{r}(\mathbf{p}, \mathbf{c})^T \mathbf{r}(\mathbf{p}, \mathbf{c})$

or

b o = a r g m i n b 1 2 r (b) T r (b) where b = (p T, c T) T

$\mathbf{b}_o = \underset{\mathbf{b}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \mathbf{r}(\mathbf{b})^T \mathbf{r}(\mathbf{b}) \quad \text{where} \quad \mathbf{b} = (\mathbf{p}^T, \mathbf{c}^T)^T$

We can see that the residual is linear with respect to the appearance parameters but nonlinear with respect to shape parameters through the motion model.

If the residual was linear with respect to both shape and appearance parameters, the previous optimization problem would reduce to a linear least squares problems. The problem would then have a closed-form solution.

Unfortunately this is not the case and fitting AAMs involves solving a non-linear least squares problem.

The most common way of solving the previous problem is by using the Gauss-Newton algorithm:

Perform a first order Taylor expansion of the residual:

r (b + Δ b) \approx r (b) + \partial r \partial b Δ b

$\mathbf{r}(\mathbf{b} + \Delta \mathbf{b}) \approx \mathbf{r}(\mathbf{b}) + \frac{\partial \mathbf{r}}{\partial \mathbf{b}} \Delta \mathbf{b}$

Plug it into the cost function:

f (b, Δ b) = 1 2 ∥ ∥ ∥ r (b) + \partial r T \partial b Δ b ∥ ∥ ∥ 2

$\mathcal{f}(\mathbf{b}, \Delta \mathbf{b}) = \frac{1}{2} \left\| \mathbf{r}(\mathbf{b}) + \frac{\partial \mathbf{r}^T}{\partial \mathbf{b}} \Delta \mathbf{b} \right\|^2$

Solve for the increments on the parameters:

Δ b o = a r g m i n b f (Δ b) = a r g m i n b 1 2 ∥ ∥ ∥ r (b) + \partial r T \partial b Δ b ∥ ∥ ∥ 2

$\Delta \mathbf{b}_o = \underset{\mathbf{b}}{\mathrm{arg\,min\;}} \, \mathcal{f}(\Delta \mathbf{b}) = \underset{\mathbf{b}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \left\| \mathbf{r}(\mathbf{b}) + \frac{\partial \mathbf{r}^T}{\partial \mathbf{b}} \Delta \mathbf{b} \right\|^2$

\partial f \partial Δ b = \partial ( 1 2 ∥ ∥ r ( b ) + \partial r \partial b Δ b ∥ ∥ 2 ) \partial Δ b = (r (b) + \partial r \partial b Δ b) \partial r \partial b T = 0

$\frac{\partial\mathcal{f}}{\partial\Delta\mathbf{b}} = \frac{\partial \left(\frac{1}{2} \left\| \mathbf{r}(\mathbf{b}) + \frac{\partial \mathbf{r}}{\partial \mathbf{b}} \Delta \mathbf{b} \right\|^2\right)}{\partial\Delta\mathbf{b}} = \left( \mathbf{r}(\mathbf{b}) + \frac{\partial \mathbf{r}}{\partial \mathbf{b}} \Delta \mathbf{b} \right) \frac{\partial \mathbf{r}}{\partial \mathbf{b}}^T =0$

Δ b = - (\partial r \partial b T \partial r \partial b) - 1 \partial r \partial b T r (b)

$\Delta \mathbf{b} = - \left( \frac{\partial \mathbf{r}}{\partial \mathbf{b}}^T \frac{\partial \mathbf{r}}{\partial \mathbf{b}} \right)^{-1} \frac{\partial \mathbf{r}}{\partial \mathbf{b}}^T \mathbf{r}(\mathbf{b})$

Δ b = - R r (b) where R = (\partial r \partial b T \partial r \partial b) - 1 \partial r \partial b T

$\Delta \mathbf{b} = - \mathbf{R} \mathbf{r}(\mathbf{b}) \quad \text{where} \quad \mathbf{R} = \left( \frac{\partial \mathbf{r}}{\partial \mathbf{b}}^T \frac{\partial \mathbf{r}}{\partial \mathbf{b}} \right)^{-1} \frac{\partial \mathbf{r}}{\partial \mathbf{b}}^T$

Update the parameters using the following update rule:

b k \leftarrow b k - 1 + Δ b

$\mathbf{b}_k \leftarrow \mathbf{b}_{k-1} + \mathbf{\Delta\mathbf{b}}$

If not converged, go to 1.

∥ b k - b k - 1 ∥ 2 > tol

$\left\| \mathbf{b}_k - \mathbf{b}_{k-1} \right\|^2 > \text{tol}$

Note that other algorithms have also been used to fit AAMs:

Steepest Descent algorithm (Amberg et al. 2009)
Newton algorithm (Kossaifi et al. 2014)
Efficient Second Order Methods (hopefully, my next paper if I ever finish it...)

Their derivations are slightly different but very related to the Gauss-Newton one. For more details on these algorithms I refer you to the previous papers.

Or... we can always try to quickly derive them on the whiteboard...

At this point we will concetrate on the previous Gauss-Newton algorithm and, in particular, look at two different techniques that one can use to compute the derivative of the residual with respect to the parameters.

Note that, in principle, the derivative depends on the parameters and needs to be recomputed at each iteration of the Gauss-Newton algorithm and so does its pseudo-inverse:

R (b) = (\partial r ( b ) \partial b T \partial r ( b ) \partial b) - 1 \partial r ( b ) \partial b T

$\mathbf{R}(\mathbf{b}) = \left( \frac{\partial \mathbf{r}(\mathbf{b})}{\partial \mathbf{b}}^T \frac{\partial \mathbf{r}(\mathbf{b})}{\partial \mathbf{b}} \right)^{-1} \frac{\partial \mathbf{r}(\mathbf{b})}{\partial \mathbf{b}}^T$

Numerical approximation:

This is the approach proposed in the original paper AAM paper (Edwards et al. 1998).

Remember the definition of derivative from those glorious high school days...

f' (a) = lim h \to 0 f ( a + h ) - f ( a ) h

$f^\prime(a) = \underset{h \rightarrow 0}{\lim} \frac{f(a+h) - f(a)}{h}$

which can be approximated by:

m = change in f ( x ) change in x = Δ f ( x ) Δ x

$m = \frac{\text{change in} \,\, f(x)}{\text{change in} \,\, x} = \frac{\Delta f(x)}{\Delta x}$

So that is the basis for numerically approximating the derivative of the residual with respect to the parameters:

\partial r \partial b i \approx 1 N N p \sum j = 1 N \sum k = 1 N p r j ( b ~ k i ) - r j ( b * i ) b ~ k i - b * i

$\frac{\partial \mathbf{r}}{\partial \mathbf{b}_i} \approx \frac{1}{NN_p}\sum_{j=1}^N \sum_{k=1}^{N_p} \frac{\mathbf{r}_j(\tilde{\mathbf{b}}_i^k) - \mathbf{r}_j(\mathbf{b}_i^*)}{\tilde{\mathbf{b}}_i^k -\mathbf{b}_i^*}$

This is a linear approximation to the derivative learned from training data by perturbing the residuals around the optimal values for the parameters.

In the original paper the authors used a combined-AAM formulation which greatly reduces the total number of parameters for which the derivative needs to be estimated. However the same approximation can also be used for the independent-AAMs described in this tutorial.

The biggest advantage of this approach is the fact that the derivative is only estimated once from training data and considered fixed throughout the optimization. Consequently, in this case, the derivative of the residual with respect to the parameters does not need to be re-estimated at each iteration of the Gauss-Newton algorithm and neither does its pseudo-inverse.

from __future__ import division
from menpofit.transform import (OrthoMDTransform, 
                                DifferentiablePiecewiseAffine, 
                                DifferentiableAlignmentSimilarity)

# choose shape parameter
param_ind = 2

# initialize residual derivative and counter
dr_db0 = 0
count = 0

for i in images[:200]:
    # build warp transform
    W = OrthoMDTransform(shape_model, DifferentiablePiecewiseAffine,
                         DifferentiableAlignmentSimilarity, 
                         source=reference_frame.landmarks['source'].lms)
    # extract image ground truth shape
    shape = i.landmarks['ibug_face_66'].lms
    # set warp tranform target to the previous shape
    W.set_target(shape)
    # save the otimal shape parameters for the ground truth shape
    b_star = W.as_vector()
    # warp image to the reference frame
    IWxp = i.warp_to_mask(reference_frame.mask, W)
    # compute residual
    r_star = IWxp.as_vector() - appearance_model.mean().as_vector()
    for delta in np.arange(-10, 10, 1):
        if delta != 0:
            # perturb shape parameter
            b_tilde = b_star.copy()
            b_tilde[param_ind] = b_tilde[param_ind] + delta
            # set transform to the pertubed position
            W.from_vector_inplace(b_tilde)
            # warp image to the reference frame using perturbed transform
            IWxp = i.warp_to_mask(reference_frame.mask, W)
            # compute perturbed residual
            r_tilde = IWxp.as_vector() - appearance_model.mean().as_vector()
            # add to approximation
            dr_db0 += (r_tilde - r_star) / delta
            count += 1

# normalize by the total number of iterations
dr_db0 = dr_db0 / count

reference_frame.from_vector(dr_db0).view()

<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x11bd36610>

Analytical derivation:

The analytical derivation is can be obtained by directly applying the chain rule:

\partial r \partial b = (\partial r \partial c, \partial r \partial p)

$\frac{\partial\mathbf{r}}{\partial\mathbf{b}} = \left( \frac{\partial\mathbf{r}}{\partial\mathbf{c}}, \frac{\partial\mathbf{r}}{\partial\mathbf{p}} \right)$

\partial r \partial c = \partial ( i [ p ] - a ¯ - A c ) \partial c = A

$\frac{\partial\mathbf{r}}{\partial\mathbf{c}} = \frac{\partial \left( \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right)}{\partial\mathbf{c}} = \mathbf{A}$

Note that the derivative of the residual with respect to appearance parameters is fixed, i.e. it does not depend on the current shape or appearance parameters. In fact, it is equal to appearance model bases.

\partial r \partial p = \partial ( i [ p ] - a ¯ - A c ) \partial p = \nabla I (W (x; p)) \partial W \partial p

$\frac{\partial\mathbf{r}}{\partial\mathbf{p}} = \frac{\partial \left( \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right)}{\partial\mathbf{p}} = \nabla I(\mathcal{W}(\mathbf{x};\mathbf{p})) \frac{\partial \mathcal{W}}{\partial \mathbf{p}}$

On the contrary, the derivative of the residual with respect to shape parameters does depend on the current shape parameters and will have to be estimated at every iteration of the algorithm.

from menpo.feature import gradient

# reset warp tranform to img initial shape
W.set_target(initial_shape)

# warp image to reference frame
IWxp = img.warp_to_mask(reference_frame.mask, W)
# compute gradient of warped image
VI_image = gradient(IWxp)
VI_image.set_boundary_pixels()
# reshape gradient
VI = VI_image.as_vector().reshape((2, -1))

VI_image.view();

# compute the warp derivative with respect  to shape parameters
dW_dp = W.d_dp(reference_frame.mask.true_indices())
dW_dp = np.rollaxis(dW_dp, -1)

# make them menpo images for visualization
template = reference_frame.copy()
template.pixels = np.concatenate((template.pixels, template.pixels), axis=0)
dW_dp_image = [template.from_vector(dW_dp[:, :, p].flatten()) 
               for p in xrange(W.n_parameters)]

visualize_images(dW_dp_image)

# compute steepest descent images
sdi = 0
a = VI[..., None] * dW_dp
for d in a:
    sdi += d
# make them menpo images for visualization
sdi_image = [reference_frame.from_vector(sdi[:, p].flatten()) 
             for p in xrange(W.n_parameters)]

visualize_images(sdi_image)

Iain Matthews and Simon Baker were the first to make use of the analytical computation of the residual derivative. A very detailed explanation of all the terms involving the derivation of the derivative is given in their paper (Matthews and Baker, 2003).

The Gauss-Newton algorithm we just described is known as the Simultaneous Forward Additive (SFA) algorithm. It is just one algorithm of a whole family of Gauss-Newton algorithm for solving the AAM fitting problem using the analytical derivation of the residual derivative.

Another algorithm of the same family is:

The Project Out Inverse Compositional (PIC) algorithm:

This algorithm divides the original cost function in two different terms:

One that acts on the linear subspace spanned by the appearance bases.
One that acts on its orthogonal complement.

∥ i [p] - a ¯ - A c ∥ 2 = ∥ i [p] - a ¯ - A c ∥ 2 A ⊥ + ∥ i [p] - a ¯ - A c ∥ 2 A

$\left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2 = \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2_{\mathbf{A}^\perp} + \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2_{\mathbf{A}}$

∥ i [p] - a ¯ - A c ∥ 2 = ∥ i [p] - a ¯ ∥ 2 A ⊥ + ∥ i [p] - a ¯ - A c ∥ 2 A

$\left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2 = \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} \right\|^2_{\mathbf{A}^\perp} + \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} - \mathbf{A}\mathbf{c} \right\|^2_{\mathbf{A}}$

How can we compute the orthogonal complement to the appearance subspace?

A ⊥ = I - A A T

$\mathbf{A}^\perp = \mathbf{I} - \mathbf{A}\mathbf{A}^T$

A quick one dimensional example proving that the above is true:

from __future__ import division
import numpy as np
import matplotlib.pyplot as plt

a = np.array([1, 1])
a = a / np.sqrt(a.T.dot(a))

r = np.array([0.3, 0.7])

a_perp = np.eye(2) - a.dot(a.T)

r_proj = a.dot(a.T.dot(r))
r_perp = r - r_proj

plt.figure()
ax = plt.gca()
ax.quiver([0, 0, 0, r_proj[0]], [0, 0, 0, r_proj[1]], 
          [a[0], r[0], r_proj[0], r_perp[0]],
          [a[1], r[1], r_proj[1], r_perp[1]], 
          angles='xy', scale_units='xy', scale=1,
          color=['k', 'r', 'g', 'b'])
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_aspect('equal')

plt.show()

The optimal value for the shape parameters can be obtained by solving the following optimization problem:

p o = a r g m i n p 1 2 ∥ i [p] - a ¯ ∥ 2 (I - A A T)

$\mathbf{p}_o = \underset{\mathbf{p}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}} \right\|^2_{(\mathbf{I} -\mathbf{A}\mathbf{A}^T)}$

p o = a r g m i n p 1 2 r T (I - A A T) r

$\mathbf{p}_o = \underset{\mathbf{p}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \mathbf{r}^T(\mathbf{I} -\mathbf{A}\mathbf{A}^T)\mathbf{r}$

Note that working on the orthogonal complement of the appearance bases accounts for the Project Out in the algorithm's name.

The Inverse bit comes from the fact that the algorithm switches the roles of the image and the template by introducing an incremental warp onto the the model's part:

p o = a r g m i n p 1 2 ∥ i [p] - a ¯ [Δ p] ∥ 2 (I - A A T)

$\mathbf{p}_o = \underset{\mathbf{p}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \left\| \mathbf{i}[\mathbf{p}] - \bar{\mathbf{a}}[\Delta\mathbf{p}] \right\|^2_{(\mathbf{I} -\mathbf{A}\mathbf{A}^T)}$

Similarly as before, this new non-linear squares problem can be solve by using the Gauss-Newton algorithm.

In this case the, the first order Taylor expansion of the residual leads to:

r (p \circ Δ p - 1) \approx r (p) + \partial r \partial p Δ p

$\mathbf{r}(\mathbf{p} \circ \Delta \mathbf{p}^{-1}) \approx \mathbf{r}(\mathbf{p}) + \frac{\partial \mathbf{r}}{\partial \mathbf{p}} \Delta \mathbf{p}$

where:

\partial r \partial p = - \nabla A ¯ (x) \partial W \partial p

$\frac{\partial\mathbf{r}}{\partial \mathbf{p}} = - \nabla \bar{A}(\mathbf{x}) \frac{\partial\mathcal{W}}{\partial\mathbf{p}}$

Substituing back onto the cost function we arrive to very similar expression as before:

Δ p o = a r g m i n Δ p f (Δ p) = a r g m i n Δ p 1 2 ∥ ∥ ∥ r (p) + \partial r T \partial p Δ p ∥ ∥ ∥ 2

$\Delta \mathbf{p}_o = \underset{\Delta\mathbf{p}}{\mathrm{arg\,min\;}} \, \mathcal{f}(\Delta \mathbf{p}) = \underset{\Delta\mathbf{p}}{\mathrm{arg\,min\;}} \, \frac{1}{2} \left\| \mathbf{r}(\mathbf{p}) + \frac{\partial \mathbf{r}^T}{\partial \mathbf{p}} \Delta \mathbf{p} \right\|^2$

And the solution for the increments on the shape parameters is given by the now familiar expression:

Δ p o = (\partial r \partial p T \partial r \partial p) - 1 \partial r \partial p r (p)

$\Delta \mathbf{p}_o = \left(\frac{\partial\mathbf{r}}{\partial\mathbf{p}}^T \frac{\partial\mathbf{r}}{\partial\mathbf{p}}\right)^{-1} \frac{\partial\mathbf{r}}{\partial\mathbf{p}} \mathbf{r}(\mathbf{p})$

Δ p o = R r (p)

$\Delta \mathbf{p}_o = \mathbf{R} \mathbf{r}(\mathbf{p})$

As noted above, the previous increments on the shape parameters are placed on the model's side.

This effectively means that the problem that we just solved was that of finding the shape parameters that best deform the model so that the difference between the warped image and the model was minimized.

Therefore, we cannot simply add these shape parameters increments to the current ones using and additive update rule.

Instead, what we need to do is to invert the incremental warp and compose it to current warp. Giving rise to following updating rule:

W (x; p) \leftarrow W (x; p) \circ W (x; Δ p) - 1

$\mathcal{W}(\mathbf{x}; \mathbf{p}) \leftarrow \mathcal{W}(\mathbf{x}; \mathbf{p}) \circ \mathcal{W}(\mathbf{x}; \Delta \mathbf{p})^{-1}$

or slightly abusing the notation:

p \leftarrow p \circ Δ p - 1

$\mathbf{p} \leftarrow \mathbf{p} \circ \Delta \mathbf{p}^{-1}$

And yes, as you have probably guessed by now, this accounts for the Compositional in the algorithm's name.

This algorithm has two main advantages with respect to the previous SFA algorithm we derived earlier:

The appearance parameters have being completely removed from the optimization problem.

The derivative of the residual with respect to the shape parameters is constant and, consequently, so is its pseudo-inverse which can now be completely pre-computed.

There are at least two other very well-known and widely used Gauss-Newton algorithms for fitting AAMs, i.e.:

Alternating Inverse Compositional (AIC) algorithm (Tzimiropoulos et al., 2012)
Fast Simultaneous Inverse Compositional (Fast-SIC) algorithm (Papandreou and Maragos, 2008)

There derivation is similar to that of the previous SFA and PIC algorithms but, unfortunately, I did not have enough time to prepare a full derivation of these algorithms.

We can still quickly do it on the whiteboard if you are interested... :-)

OK, maths are over!!!

Let's see how each one of these algorithms perform!

We will start by loading some test images:

import menpo.io as mio
from menpo.landmark import labeller, ibug_face_66

test_images = []
for i  in mio.import_images('/Users/joan/PhD/DataBases/faces/lfpw/testset/', 
                            max_images=10, verbose=True):
    
    i.crop_to_landmarks_proportion_inplace(0.5)
    i = i.rescale_landmarks_to_diagonal_range(200)
    labeller(i, 'PTS', ibug_face_66)
    if i.n_channels == 3:
        i = i.as_greyscale(mode='luminosity')
    
    test_images.append(i)

- Loading 10 assets: [====================] 100%

Now let's load an AAM and fit it to the previous images using the following algorithms:

Project Out Inverse Algorithm (PIC)
Alternating Inverse Algorithm (AIC)
Fast Simultaneous Inverse Algorithm (Fast-SIC).

Sadly numerical approximation algorithms for fitting AAMs have not been implemented yet in menpo :-( so the list of algorithms is limited to those using analytical derivations.

# load a pre-trained AAM
aam = pickle_load('/Users/joan/PhD/Models/gaam_int.menpo')

from alabortijcv2015.aam import StandardAAMFitter
from alabortijcv2015.aam.algorithm import PIC, AIC, SIC 

# build aam fitter with PIC algorithm
pic_fitter = StandardAAMFitter(aam, algorithm_cls=PIC, n_shape=10, n_appearance=100)
# build aam fitter with AIC algorithm
aic_fitter = StandardAAMFitter(aam, algorithm_cls=AIC, n_shape=10, n_appearance=100)
# build aam fitter with Fast-SIC algorithm
sic_fitter = StandardAAMFitter(aam, algorithm_cls=SIC, n_shape=10, n_appearance=100)

# intialize list of fitter results
pic_fitter_results = []
aic_fitter_results = []
sic_fitter_results = []

np.random.seed(1)

for j, i in enumerate(test_images):  
    # get the ground truth landmark for the test image
    gt_s = i.landmarks['ibug_face_66'].lms
    # noisy align the model's mean shape to them
    s = pic_fitter.perturb_shape(gt_s, noise_std=0.02)
    
    # fit with PIC
    pic_fr = pic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
    # fit with AIC
    aic_fr = aic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
    # fit with Fast-SIC
    sic_fr = sic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
    
    # this nonsense is due to a bug in my code!!!
    pic_fr.downscale = 0.5
    aic_fr.downscale = 0.5
    sic_fr.downscale = 0.5
    
    # add fitter results to their respective list
    pic_fitter_results.append(pic_fr)
    aic_fitter_results.append(aic_fr)
    sic_fitter_results.append(sic_fr)
    
    # print the error per fitting
    print 'Image: ', j
    print 'PIC:' 
    print pic_fr
    print 'AIC'
    print aic_fr
    print 'SIC'
    print sic_fr

Image:  0
PIC:
Initial error: 0.0424
Final error: 0.0555
AIC
Initial error: 0.0424
Final error: 0.0205
SIC
Initial error: 0.0424
Final error: 0.0226
Image:  1
PIC:
Initial error: 0.0727
Final error: 0.0453
AIC
Initial error: 0.0727
Final error: 0.0552
SIC
Initial error: 0.0727
Final error: 0.0572
Image:  2
PIC:
Initial error: 0.0552
Final error: 0.0773
AIC
Initial error: 0.0552
Final error: 0.0347
SIC
Initial error: 0.0552
Final error: 0.0339
Image:  3
PIC:
Initial error: 0.0512
Final error: 0.1402
AIC
Initial error: 0.0512
Final error: 0.0210
SIC
Initial error: 0.0512
Final error: 0.0270
Image:  4
PIC:
Initial error: 0.0713
Final error: 0.3053
AIC
Initial error: 0.0713
Final error: 0.0551
SIC
Initial error: 0.0713
Final error: 0.0430
Image:  5
PIC:
Initial error: 0.0506
Final error: 0.0869
AIC
Initial error: 0.0506
Final error: 0.0478
SIC
Initial error: 0.0506
Final error: 0.0451
Image:  6
PIC:
Initial error: 0.0564
Final error: 0.4282
AIC
Initial error: 0.0564
Final error: 0.0227
SIC
Initial error: 0.0564
Final error: 0.0217
Image:  7
PIC:
Initial error: 0.0854
Final error: 0.1289
AIC
Initial error: 0.0854
Final error: 0.0430
SIC
Initial error: 0.0854
Final error: 0.0409
Image:  8
PIC:
Initial error: 0.0662
Final error: 0.0750
AIC
Initial error: 0.0662
Final error: 0.0339
SIC
Initial error: 0.0662
Final error: 0.0334
Image:  9
PIC:
Initial error: 0.0333
Final error: 0.0596
AIC
Initial error: 0.0333
Final error: 0.0241
SIC
Initial error: 0.0333
Final error: 0.0381

We can also check the results visually:

from menpofit.visualize import visualize_fitting_results

visualize_fitting_results(pic_fitter_results)

from menpofit.visualize import visualize_fitting_results

visualize_fitting_results(aic_fitter_results)

from menpofit.visualize import visualize_fitting_results

visualize_fitting_results(sic_fitter_results)

Finally, let us also quickly compare the speed of each algorithm:

%timeit pic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)

10 loops, best of 3: 39.9 ms per loop

%timeit aic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)

10 loops, best of 3: 61 ms per loop

%timeit sic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)

10 loops, best of 3: 96.7 ms per loop

Things to take with you from this second part:

AAM fitting is defined as a non-linear squares problem.
The Gauss-Newton algorithm can be used to solve AAM fitting.
Numerical approximation.
Analytical derivation.
Analytical algorithms are defined by:
The way in which they handle the appearance parameters:
Project out.
Alternating.
Simultaneous.
The roles that the input image and the appearance model problem play in the optimization:
Forward.
Inverse.
The update rule:
Additive.
Compositional.

# Understanding Active Appearance Models

i·BUG group tutorial

Joan Alabort-i-Medina

## Part 1: Building Active Appearance Models

Questions?

## Part 2: Fitting Active Appearance Models

Questions?

## Part 3: Feature-based Active Appearance Models

##Part 4: Supervised Descent Method