Versatile expressions might carry 3D-generated faces out of the eerie valley


3D rendered faces are an important part of any big movie or game these days, but the task of capturing and animating them naturally can be tricky. Disney Research is working on ways to smooth this process, including a machine learning tool that will make it much easier to generate and manipulate 3D faces without delving into the eerie valley.

Of course, this technology has come a long way from the wooden expressions and limited details of days gone by. High-resolution, compelling 3D faces can be animated quickly and well, but the subtleties of human expression are not only endlessly varied, they can also be easily misunderstood.

Think how a person's entire face changes when they smile – it's different for everyone, but there are enough similarities that we believe we can tell when someone is “really” smiling or just faking it. How can you achieve this level of detail in an artificial face?

Existing "linear" models simplify the subtlety of expression and make "happiness" or "anger" finely adjustable, but at the expense of accuracy – they cannot express every possible face but can easily lead to impossible faces. Newer neural models learn complexity by observing the interconnectedness of expressions, but like other such models, their functions are obscure and difficult to control, and may not be generalizable beyond the faces from which they have learned. They don't provide the control an artist working on a movie or game needs, or lead to faces that (people are remarkably good at recognizing this) are somehow wrong.

A team at Disney Research is proposing a new model with the best of both worlds – the so-called "semantic deep face model". Without going into the exact technical execution, the fundamental improvement is that it is a neural model that learns how an expression affects the entire face, but is not specific to a single face – and is also non-linear , allowing flexibility in how expressions interact with a face geometry and each other.

Think of it this way: With a linear model, you can display an expression (such as a smile or a kiss) from 0 to 100 on any 3-D face. However, the results can be unrealistic. With a neural model, you can realistically take a learned expression from 0 to 100, but only on the face it learned from. This model can easily assume an expression from 0 to 100 on any 3D face. This is an oversimplification, but you have the idea.

Credit: Disney Research

The result is impressive: you can create a thousand faces with different shapes and tones and then animate them all with the same expressions with no extra effort. Think how this can lead to different masses of CG that you can conjure up with a few clicks or characters in games with realistic facial expressions, whether or not they are handcrafted.

It's not a silver bullet and just part of a multitude of improvements that artists and engineers are making in the various industries that use this technology – markerless face tracking, better skin deformation, realistic eye movements, and dozens of other areas of interest are all important parts of this process.

The Disney Research Paper was presented at the International Conference on 3D Vision. You can read the whole thing here.