%matplotlib inline
import numpy as np
"Active Apperance Models (AAMs) are non-linear, generative, and parametric models of a certain visual phenomenon".
I. Matthews and S. Baker, 2004.
generative: they generate images of a particular object class (e.g. the human face).
parametric: they are controlled by a set of parameters.
non-linear: they are non-linear in terms of pixel intensities.
from menpofit.visualize import visualize_aam
from alabortcvpr2015.utils import pickle_load
aam = pickle_load('/Users/joan/PhD/Models/aam_int.menpo')
visualize_aam(aam)
A bit of history...
They were originally proposed in 1998 by G. Edwards, C. J. Taylor, and T. F. Cootes from Department of Medical Biophysics (not Computing!) at University of Manchester.
Luckily for the original authors quite a lot of research stemmed from the original paper:
although, (typically) linear in terms of both shape and texture, the image formation process is non-linear in terms of pixel intensities
To get us started with AAMs, we will assume that the object we are interested in modelling is no other than the human face:
The first thing we will need is a large collection of face images:
import menpo.io as mio
from menpo.landmark import labeller, ibug_face_66
images = []
for i in mio.import_images('/Users/joan/PhD/DataBases/faces/lfpw/trainset/',
max_images=None, verbose=True):
i.crop_to_landmarks_proportion_inplace(0.5)
i = i.rescale_landmarks_to_diagonal_range(100)
labeller(i, 'PTS', ibug_face_66)
if i.n_channels == 3:
i = i.as_greyscale(mode='luminosity')
images.append(i)
from menpo.visualize import visualize_images
visualize_images(images)
Wait a moment... What are these points!
Fair enough, I kind of lied before...
What we really need is a large collection of carefully * annotated* face images.
The previous annotations try to encode the notion face shape...
A shape is the form of an object or its external boundary, outline, or external surface, as opposed to other properties such as colour, texture or material composition.
Wikipedia
...by consistently identifying the positions of a small set of landmarks defining the faces in all images.
In morphometrics, landmark point or shortly landmark is a point in a shape object in which correspondences between and within the populations of the object are preserved.
Wikipedia
Mathematically, a shape can be defined as:
from menpo.visualize import visualize_shapes
visualize_shapes([i.landmarks for i in images])
In AAMs, images of a particular object are generated by combining linear models describing the shape and texture of the object using a specific motion model (also referred to as the warp).
Let us start by formally defining the shape model:
where:
The previous shape model can be learned by applying Principal Component Analysis (PCA) to the set of manually annotated points defining the object shape in the images (usually after registering all shapes using Procrustes Analysis).
from menpo.transform import Translation, GeneralizedProcrustesAnalysis
from menpo.model import PCAModel
# extract shapes from images
shapes = [i.landmarks['ibug_face_66'].lms for i in images]
# centralize shapes
centered_shapes = [Translation(-s.centre()).apply(s) for s in shapes]
# align centralized shape using Procrustes Analysis
gpa = GeneralizedProcrustesAnalysis(centered_shapes)
aligned_shapes = [s.aligned_source() for s in gpa.transforms]
# build shape model
shape_model = PCAModel(aligned_shapes)
from menpofit.visualize import visualize_shape_model
visualize_shape_model(shape_model)
Note that because the shapes were normalized using Procrustes Analysis before we applied PCA, the previous shape model has mainly learned nonrigid facial deformations and lacks the ability of placing shapes at arbitrary positions on the image plane.
Luckly, this problem can be solved by composing the model with a 2d similarity transform:
where:
Luckily after some clever reparameterization (Matthews and Baker, 2004) the shape model can still be concisely expressed as before:
where:
import numpy as np
from menpo.transform import AlignmentSimilarity
from menpo.model import MeanInstanceLinearModel
from menpofit.modelinstance import OrthoPDM
# get shape model mean as numpy array
shape_vector = shape_model.mean().as_vector()
# initialize S star
S_star = np.zeros((4, shape_vector.shape[0]))
# first column is the mean
S_star[0, :] = shape_vector # Comp. 1 - just the mean
# second column is the rotated mean
rotated_ccw = shape_model.mean().points[:, ::-1].copy() # flip x,y -> y,x
rotated_ccw[:, 0] = -rotated_ccw[:, 0] # negate (old) y
S_star[1, :] = rotated_ccw.flatten() # C2 - the mean rotated 90 degs
# third column
S_star[2, ::2] = 1 # Tx
# fourth column
S_star[3, 1::2] = 1 # Ty
# build 2d similarity model
sim_2d_model = MeanInstanceLinearModel(S_star, shape_vector, shape_model.mean())
# orthogonalize and compose 2d similarity model with original shape model
augmented_sm = shape_model.copy()
augmented_sm.orthonormalize_against_inplace(sim_2d_model)
visualize_shape_model(augmented_sm)
from menpofit.transform import DifferentiableAlignmentSimilarity
from menpofit.modelinstance import OrthoPDM
augmented_sm = OrthoPDM(shape_model, AlignmentSimilarity)
visualize_shape_model(augmented_sm.model)
So far, so good ;-)
Apart from that las bit...
Let us now switch our attention to the appearance model.
We will shortly see how the appearance model is also learned using PCA. However, in order to be able to apply PCA we first need to introduce the motion model and the concept of shape-free textures.
PCA can only be applied in a particular vector space or, in other words, all vectors to which we want to apply PCA to must have the same lenght.
This is clearly not the case for the face images we just loaded:
print 'image 0 is:', images[0]
print 'image 1 is:', images[1]
We could resize all images to a particular resolution but that is very likely to arbitrarily modify their original aspect ratio and include a lot of backgorund information (even if images were tighly cropped).
Instead, the idea is to make use of the annotated landmarks (which are a requirement) to define the face appearance region in each image and map it to the same vector space.
This can be done by:
The non-linear warping function is referred to as the motion model and typical choices for this function include Piece Wise Affine and Thin Plate Splines warps.
Once all images have been warped onto the reference frame they all have the same dimensionality (i.e. they all have the same number of pixels) and we are ready to apply PCA.
Note that, after they have being warped, all images also share the same face shape and hence the name shape-free textures.
A shape free texture can be mathematically defined using the following expression:
from menpo.transform import PiecewiseAffine
from menpofit.aam.builder import build_reference_frame
# build reference frame
reference_frame = build_reference_frame(shape_model.mean())
reference_shape = reference_frame.landmarks['source'].lms
# build PiecewiseAffine transforms
transforms = [PiecewiseAffine(reference_shape, s) for s in shapes]
# warp images
warped_images = []
for (i, t) in zip(images, transforms):
wi = i.warp_to_mask(reference_frame.mask, t)
wi.landmarks = reference_frame.landmarks
warped_images.append(wi)
visualize_images(warped_images)
After defining the motion model and introducing the concept of shape-free textures, we are now ready to formally define the appearance model:
where:
appearance_model = PCAModel(warped_images)
from menpofit.visualize import visualize_appearance_model
visualize_appearance_model(appearance_model)
Well done!!! We have now almost covered the basics concepts defining Active Appearance Models.
Only one bit is missing, and that is how to combine the previous three models (shape, appearance and motion) so that we can effectively generate novel face images using AAMs.
And the answer is:
# choose shape parameters at random
p = (np.random.randn(shape_model.n_components) *
np.sqrt(shape_model.eigenvalues))
# generate shape instance
s = shape_model.instance(p)
# define image frame containing shape instance
I = build_reference_frame(s)
landmarks = I.landmarks['source'].lms
I.view_landmarks(group='source', render_numbering=False)
# choose appearance parameters at random
c = (np.random.randn(appearance_model.n_components) *
np.sqrt(appearance_model.eigenvalues))
# generate shape instance
A = appearance_model.instance(c)
A.view()
# compute PiecewiseAffine transform
transform = PiecewiseAffine(landmarks, A.landmarks['source'].lms)
I = A.warp_to_mask(I.mask, transform, warp_landmarks=True)
I.view_landmarks(group='source', render_numbering=False)
Things to take with you from this first part:
AAMs are non-linear, generative, and parametric models of visual phenomena.
Motion model
Shape and appearance models are linear models learned from annotated training data using PCA.
The motion model non-linearly relates the shape and appearance models and is itself an essential part of the AAM formulation.
AAMs were originally developed for solving non-rigid object alignment problems. And up until now they remain quite popular in the domains of face alignment and medical image registration.
As we did in part one, we will find it useful to restrict the problem to the domain faces, i.e. we will use AAMs to specifically tackle the face alignment problem.
Let us start by defining the problem:
Fitting an Active Appearance Model consists of finding the optimal parameters for which its shape and appearance models accurately describe the object being modelled in a particular image.
This definition is mine! :-)
Note that the only available information at fitting time is:
# load image
img = mio.import_image('/Users/joan/PhD/DataBases/faces/lfpw/testset/image_0001.png')
# pre-processing
img.crop_to_landmarks_proportion_inplace(0.5)
img = img.rescale_landmarks_to_diagonal_range(100)
labeller(img, 'PTS', ibug_face_66)
if img.n_channels == 3:
img = img.as_greyscale(mode='luminosity')
img.view()
from menpofit.base import noisy_align
# noisy aligned the shape model's mean with the ground truth
transform = noisy_align(shape_model.mean(),
img.landmarks['ibug_face_66'].lms)
initial_shape = transform.apply(shape_model.mean())
# add the initial shape as landmarks to the image
img.landmarks['initial_guess'] = initial_shape
img.view_landmarks(group='initial_guess', render_numbering=False);
from alabortijcv2015.utils import pickle_load
from menpofit.visualize import visualize_aam
aam = pickle_load('/Users/joan/PhD/Models/aam_int.menpo')
visualize_aam(aam)
The problem of fitting AAMs to input images can be formally defined as:
which can be concisely rewritten in vector form as:
Therefore, fitting an AAM to novel image consists of finding the optimal shape and appearance parameters for which the previous expression is minimized.
Note that:
Ideally, the reconstruction error should be minimized when the shape parameters correctly place the face landmarks onto the input image.
When that happens, the input image is correctly warped onto the reference frame and its appearance model reconsturction should be optimal (hence, the error minimized).
Let us define the residual:
note that the problem can now also be written as:
or
We can see that the residual is linear with respect to the appearance parameters but nonlinear with respect to shape parameters through the motion model.
If the residual was linear with respect to both shape and appearance parameters, the previous optimization problem would reduce to a linear least squares problems. The problem would then have a closed-form solution.
Unfortunately this is not the case and fitting AAMs involves solving a non-linear least squares problem.
The most common way of solving the previous problem is by using the Gauss-Newton algorithm:
Note that other algorithms have also been used to fit AAMs:
Their derivations are slightly different but very related to the Gauss-Newton one. For more details on these algorithms I refer you to the previous papers.
Or... we can always try to quickly derive them on the whiteboard...
At this point we will concetrate on the previous Gauss-Newton algorithm and, in particular, look at two different techniques that one can use to compute the derivative of the residual with respect to the parameters.
Note that, in principle, the derivative depends on the parameters and needs to be recomputed at each iteration of the Gauss-Newton algorithm and so does its pseudo-inverse:
This is the approach proposed in the original paper AAM paper (Edwards et al. 1998).
Remember the definition of derivative from those glorious high school days...
which can be approximated by:
So that is the basis for numerically approximating the derivative of the residual with respect to the parameters:
This is a linear approximation to the derivative learned from training data by perturbing the residuals around the optimal values for the parameters.
In the original paper the authors used a combined-AAM formulation which greatly reduces the total number of parameters for which the derivative needs to be estimated. However the same approximation can also be used for the independent-AAMs described in this tutorial.
The biggest advantage of this approach is the fact that the derivative is only estimated once from training data and considered fixed throughout the optimization. Consequently, in this case, the derivative of the residual with respect to the parameters does not need to be re-estimated at each iteration of the Gauss-Newton algorithm and neither does its pseudo-inverse.
from __future__ import division
from menpofit.transform import (OrthoMDTransform,
DifferentiablePiecewiseAffine,
DifferentiableAlignmentSimilarity)
# choose shape parameter
param_ind = 2
# initialize residual derivative and counter
dr_db0 = 0
count = 0
for i in images[:200]:
# build warp transform
W = OrthoMDTransform(shape_model, DifferentiablePiecewiseAffine,
DifferentiableAlignmentSimilarity,
source=reference_frame.landmarks['source'].lms)
# extract image ground truth shape
shape = i.landmarks['ibug_face_66'].lms
# set warp tranform target to the previous shape
W.set_target(shape)
# save the otimal shape parameters for the ground truth shape
b_star = W.as_vector()
# warp image to the reference frame
IWxp = i.warp_to_mask(reference_frame.mask, W)
# compute residual
r_star = IWxp.as_vector() - appearance_model.mean().as_vector()
for delta in np.arange(-10, 10, 1):
if delta != 0:
# perturb shape parameter
b_tilde = b_star.copy()
b_tilde[param_ind] = b_tilde[param_ind] + delta
# set transform to the pertubed position
W.from_vector_inplace(b_tilde)
# warp image to the reference frame using perturbed transform
IWxp = i.warp_to_mask(reference_frame.mask, W)
# compute perturbed residual
r_tilde = IWxp.as_vector() - appearance_model.mean().as_vector()
# add to approximation
dr_db0 += (r_tilde - r_star) / delta
count += 1
# normalize by the total number of iterations
dr_db0 = dr_db0 / count
reference_frame.from_vector(dr_db0).view()
The analytical derivation is can be obtained by directly applying the chain rule:
Note that the derivative of the residual with respect to appearance parameters is fixed, i.e. it does not depend on the current shape or appearance parameters. In fact, it is equal to appearance model bases.
On the contrary, the derivative of the residual with respect to shape parameters does depend on the current shape parameters and will have to be estimated at every iteration of the algorithm.
from menpo.feature import gradient
# reset warp tranform to img initial shape
W.set_target(initial_shape)
# warp image to reference frame
IWxp = img.warp_to_mask(reference_frame.mask, W)
# compute gradient of warped image
VI_image = gradient(IWxp)
VI_image.set_boundary_pixels()
# reshape gradient
VI = VI_image.as_vector().reshape((2, -1))
VI_image.view();
# compute the warp derivative with respect to shape parameters
dW_dp = W.d_dp(reference_frame.mask.true_indices())
dW_dp = np.rollaxis(dW_dp, -1)
# make them menpo images for visualization
template = reference_frame.copy()
template.pixels = np.concatenate((template.pixels, template.pixels), axis=0)
dW_dp_image = [template.from_vector(dW_dp[:, :, p].flatten())
for p in xrange(W.n_parameters)]
visualize_images(dW_dp_image)
# compute steepest descent images
sdi = 0
a = VI[..., None] * dW_dp
for d in a:
sdi += d
# make them menpo images for visualization
sdi_image = [reference_frame.from_vector(sdi[:, p].flatten())
for p in xrange(W.n_parameters)]
visualize_images(sdi_image)
Iain Matthews and Simon Baker were the first to make use of the analytical computation of the residual derivative. A very detailed explanation of all the terms involving the derivation of the derivative is given in their paper (Matthews and Baker, 2003).
The Gauss-Newton algorithm we just described is known as the Simultaneous Forward Additive (SFA) algorithm. It is just one algorithm of a whole family of Gauss-Newton algorithm for solving the AAM fitting problem using the analytical derivation of the residual derivative.
Another algorithm of the same family is:
This algorithm divides the original cost function in two different terms:
How can we compute the orthogonal complement to the appearance subspace?
A quick one dimensional example proving that the above is true:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
a = np.array([1, 1])
a = a / np.sqrt(a.T.dot(a))
r = np.array([0.3, 0.7])
a_perp = np.eye(2) - a.dot(a.T)
r_proj = a.dot(a.T.dot(r))
r_perp = r - r_proj
plt.figure()
ax = plt.gca()
ax.quiver([0, 0, 0, r_proj[0]], [0, 0, 0, r_proj[1]],
[a[0], r[0], r_proj[0], r_perp[0]],
[a[1], r[1], r_proj[1], r_perp[1]],
angles='xy', scale_units='xy', scale=1,
color=['k', 'r', 'g', 'b'])
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_aspect('equal')
plt.show()
The optimal value for the shape parameters can be obtained by solving the following optimization problem:
Note that working on the orthogonal complement of the appearance bases accounts for the Project Out in the algorithm's name.
The Inverse bit comes from the fact that the algorithm switches the roles of the image and the template by introducing an incremental warp onto the the model's part:
Similarly as before, this new non-linear squares problem can be solve by using the Gauss-Newton algorithm.
In this case the, the first order Taylor expansion of the residual leads to:
where:
Substituing back onto the cost function we arrive to very similar expression as before:
And the solution for the increments on the shape parameters is given by the now familiar expression:
As noted above, the previous increments on the shape parameters are placed on the model's side.
This effectively means that the problem that we just solved was that of finding the shape parameters that best deform the model so that the difference between the warped image and the model was minimized.
Therefore, we cannot simply add these shape parameters increments to the current ones using and additive update rule.
Instead, what we need to do is to invert the incremental warp and compose it to current warp. Giving rise to following updating rule:
or slightly abusing the notation:
And yes, as you have probably guessed by now, this accounts for the Compositional in the algorithm's name.
This algorithm has two main advantages with respect to the previous SFA algorithm we derived earlier:
There are at least two other very well-known and widely used Gauss-Newton algorithms for fitting AAMs, i.e.:
Alternating Inverse Compositional (AIC) algorithm (Tzimiropoulos et al., 2012)
Fast Simultaneous Inverse Compositional (Fast-SIC) algorithm (Papandreou and Maragos, 2008)
There derivation is similar to that of the previous SFA and PIC algorithms but, unfortunately, I did not have enough time to prepare a full derivation of these algorithms.
We can still quickly do it on the whiteboard if you are interested... :-)
OK, maths are over!!!
Let's see how each one of these algorithms perform!
We will start by loading some test images:
import menpo.io as mio
from menpo.landmark import labeller, ibug_face_66
test_images = []
for i in mio.import_images('/Users/joan/PhD/DataBases/faces/lfpw/testset/',
max_images=10, verbose=True):
i.crop_to_landmarks_proportion_inplace(0.5)
i = i.rescale_landmarks_to_diagonal_range(200)
labeller(i, 'PTS', ibug_face_66)
if i.n_channels == 3:
i = i.as_greyscale(mode='luminosity')
test_images.append(i)
Now let's load an AAM and fit it to the previous images using the following algorithms:
Sadly numerical approximation algorithms for fitting AAMs have not been implemented yet in menpo :-( so the list of algorithms is limited to those using analytical derivations.
# load a pre-trained AAM
aam = pickle_load('/Users/joan/PhD/Models/gaam_int.menpo')
from alabortijcv2015.aam import StandardAAMFitter
from alabortijcv2015.aam.algorithm import PIC, AIC, SIC
# build aam fitter with PIC algorithm
pic_fitter = StandardAAMFitter(aam, algorithm_cls=PIC, n_shape=10, n_appearance=100)
# build aam fitter with AIC algorithm
aic_fitter = StandardAAMFitter(aam, algorithm_cls=AIC, n_shape=10, n_appearance=100)
# build aam fitter with Fast-SIC algorithm
sic_fitter = StandardAAMFitter(aam, algorithm_cls=SIC, n_shape=10, n_appearance=100)
# intialize list of fitter results
pic_fitter_results = []
aic_fitter_results = []
sic_fitter_results = []
np.random.seed(1)
for j, i in enumerate(test_images):
# get the ground truth landmark for the test image
gt_s = i.landmarks['ibug_face_66'].lms
# noisy align the model's mean shape to them
s = pic_fitter.perturb_shape(gt_s, noise_std=0.02)
# fit with PIC
pic_fr = pic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
# fit with AIC
aic_fr = aic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
# fit with Fast-SIC
sic_fr = sic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
# this nonsense is due to a bug in my code!!!
pic_fr.downscale = 0.5
aic_fr.downscale = 0.5
sic_fr.downscale = 0.5
# add fitter results to their respective list
pic_fitter_results.append(pic_fr)
aic_fitter_results.append(aic_fr)
sic_fitter_results.append(sic_fr)
# print the error per fitting
print 'Image: ', j
print 'PIC:'
print pic_fr
print 'AIC'
print aic_fr
print 'SIC'
print sic_fr
We can also check the results visually:
from menpofit.visualize import visualize_fitting_results
visualize_fitting_results(pic_fitter_results)
from menpofit.visualize import visualize_fitting_results
visualize_fitting_results(aic_fitter_results)
from menpofit.visualize import visualize_fitting_results
visualize_fitting_results(sic_fitter_results)
Finally, let us also quickly compare the speed of each algorithm:
%timeit pic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
%timeit aic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
%timeit sic_fitter.fit(i, s, gt_shape=gt_s, max_iters=20)
Things to take with you from this second part:
AAM fitting is defined as a non-linear squares problem.
Analytical derivation.
Compositional.