The human population is aging rapidly. In 2020, there were an estimated 727 million persons 65 years of age or over worldwide, a figure projected to more than double by 2050, then reaching over 1.5 billion persons. By 2050, one in six people globally will be 65 years of age or older. Moreover, because of greater relative longevity versus men, women accounted for 55% of the global population 65 years of age or over in 2020 and 62% of those 80 years of age or over.1
Intrinsic (chronological) and extrinsic (incremental and environmental) factors drive a year-on-year, progressive, cumulative change in human facial structure and appearance through a range of modifications to underlying bone, muscle, fat, and cutaneous tissue.2 The field of evolutionary psychology provides empirical evidence that human physical appearance (signaling age, health and, therefore, “attractiveness”3) has significant influence on social interaction and mating behavior and that perception of facial attractiveness seems to be remarkably consistent, regardless of ethnicity, nationality, or age.4–6 A significant body of work has highlighted the role of homogeneity in visual cues of youth and health, whether in shape (symmetry and averageness), color (the homogeneous distribution of chromophores), or topography (the isotropic distribution of specular highlight and shadow). This profound legacy of evolutionary pressure, therefore, is at the heart of our continuing obsession with appearance, even as the human population ages.7
For decades, there has been significant interest in methods to predict and generate future facial appearance, for needs including entertainment and film making, the synthesis of a reliable current visible likeness for missing-person investigations, and other various forensic, law-enforcement, and national-security applications.8,9 Whereas these efforts have centered mostly around analog approaches, the more recent aesthetic revolution described above has driven significant activity using digital techniques in other academic and industry sectors, including reconstructive/plastic surgery (eg, for the generation of age-appropriate target parameters) and the cosmetics industry (eg, for the generation of visualizations driving awareness of aging issues/product trial and compliance).
Examples of digital approaches include the synthesis of average faces (with the use of landmarking),10 so-called Cut and Paste methods11 and aging using Deep Learning approaches such as generative adversarial networks.12
Although these approaches are certainly significant improvements on previous analogue attempts, they all suffer from significant limitations, including low resolution and discontinuous output (ie, simply modeling an “older” appearance rather than a specific target age with visual continuity between the intermediate ages).
Here, we describe a unique approach to generate an age appearance simulation that overcomes these limitations, based on the modeling of a large whole-head 3D dataset across five ethnicities.13,14
Five ethnic groups were chosen for this research, which we titled “Caucasian,” “Chinese,” “African,” “Indian,” and “Latino.” A total of 1250 female subjects were recruited, 250 per ethnic group, 10–80 years of age, in equal 10-year cohorts (yielding approximately 36 subjects per decade, per ethnic group).
Because of the logistical challenges of recruitment and data capture in the native geographies of some of these ethnic groups, first-generation immigrant “African” subjects (from Sub-Saharan West Africa, reflecting the large Nigerian population in Houston), “Indian” subjects (from the Indian subcontinent), and “Latino” subjects (from Mexico, reflecting the fact that >90% of Latinos in North America are Mexican) were recruited and studied in the highly cosmopolitan North American city of Houston, Texas (approximately 6 million people live in Greater Houston, comprising 37% Latino, 25% African, and 6% Indian ethnicities). “Caucasian” subjects were recruited and measured in Minneapolis, Minnesota. “Chinese” subjects were recruited and measured in Beijing.
Subjects were equilibrated in a controlled temperature/relatively humidity chamber (20 ± 1°C; 50 ± 10% relatively humidity) for 30 minutes before data capture.
Whole-face, high-resolution 3D models of each subject were acquired using a VECTRA M3 (Canfield Scientific Inc., Parsippany, N.J.) system, using the principle of stereophotogrammetry (Fig. 1). A 1.2 mm resolution (triangle edge-length) geometry across the measurement field was acquired in 3.5 ms, using a combination of three modular stereo pods, each containing two high-resolution DSLR cameras and flash units.
To ensure consistency, all VECTRA M3 subject face models were prepared for processing by first removing any nonfacial geometry and then reorienting the 3D models to face forward.
The following craniofacial anatomical landmarks (23 in total, some used twice to include left and right sides of the face) were then placed on each 3D model, using a combination of both automated and manual procedures: nasal tip, philtral crest, labrale inferious, subnasale, medial canthus, lateral canthus, alar, radix, glabella, columella, menton, pogonion, oral commissure, prejowl sulcus, ear lobe attachment, and mandibular angle (Fig. 2).
In preparation for building a front-view age appearance model, 3D landmarks and images were projected into 2D space, followed by generalised procrustes analysis15 to correct for any shift or rotation in the faces, thus aligning these landmarks precisely between all subjects within each ethnic group. Because the 3D system is calibrated, scale correction was not needed.
In a final alignment step, the projected images and landmarks were shifted to achieve a common average vertical position of the eye landmarks, thus setting the eye as an anchored point of reference.
Age Appearance Model
The age appearance model was built separately for each of the five ethnicities and comprised three submodels: shape, color, and topography.
The shape model was built by first symmetrically averaging the XY location of landmarks along the central vertical axis of the face for each subject. A regression was then fitted to each XY coordinate versus age, providing the spatial trajectory of changes in landmark location as a function of age.
The color model was built by performing a regression on the image pixel RGB color values after warping each image to a common face shape, yielding a model that predicts the pixel values making up the facial image for a given age, including overall tone and shadowing.
The topography model was built by warping and decomposing each subject image into a smoothed wavelet pyramid.16 A regression was then performed for each coefficient at each wavelet pyramid level, to yield a model predicting wavelet intensity as a function of age. This encapsulates finer textural details such as wrinkles.
Figure 3 shows the result of these models when used to predict the appearance of the average face, at each decade, 10–80 years.
Age Simulation Process
With the three models for a given ethnicity, a 2D facial image of a subject with a known chronological age can now be transformed to simulate a target age, whether older or younger, in the following manner (Fig. 4).
First, anatomical landmarks are identified automatically across the subject’s face using a custom, trained convolutional neural network. Next, the expected shape change between the original and target age is determined, using the shape regression model. This difference is applied to the face in the image through thin plate spline warping to yield the target aged face shape. The expected pixel color difference between the original age and the target age, based on the color model, is then added to the shape-warped image to produce a target age image with color delta applied. Finally, the wavelet pyramid regression model is employed to enhance the finer texture at the target age based on the predicted difference between wavelet coefficients, at all levels of the wavelet pyramid. This difference is applied to the decomposed image in the wavelet domain, after which the final image is reconstructed. This final simulated image thus contains the expected shape, color, and topographical changes associated with aging or de-aging.
Initial experiments with this procedure yielded simulated images that displayed “visual dissonance” (a psychological tension where one experiences a disparity between what one expects to see and what one actually sees). This was because a highly realistic age simulation had been applied to the skin within the image, but hair was unmodified. A convolutional neural network-based skin and hair detection approach was used, therefore, to isolate the hair mass, to which changes were made, commensurate with the target age of the subject (eg, hair graying applied, for realism, to certain older target ages).
When the aging/de-aging simulation is applied to all years within a certain range (eg, aging the facial image of a 30-year-old woman to all ages between 31 and 60 years), the resulting series of images can be combined into an animation or an interactive slider showing a continuous aging/de-aging simulation, starting from the subjects’ actual chronological age. An example of such a simulation is shown in Figure 5 and the accompanying video (see Video [online], which shows the simulated appearance at all ages between 20 and 80 years for the same “African” subject with an original, chronological age of 41).
The realism and accuracy of the aging models were evaluated using a state-of-the-art, cloud-based AI algorithm to detect human faces in images and compute an age estimate for each (Microsoft Azure Face API17).
Facial images of 194 individuals across the five ethnicities studied, with actual chronological ages between 35 and 44 years, were processed by the models to simulate a change in age corresponding to −20, −10, +10, +20, and +30 years relative to chronological age (thus covering an aggregate simulated age range of 15–74 years). The appropriate ethnicity models were used for each subject. Ages were binned into five age ranges: 15–24, 25–34, 45–54, 55–64, and 65–74. For comparison, additional images of other subjects whose actual, chronological age was within these same age ranges were also included (n = 185 within the 15–24 age group, n = 178 within 25–34, n = 206 within 45–54, n = 207 within 55–64, and n = 130 within 65–74). All images were processed by the cloud AI algorithm to obtain an estimated age for each face.
Figure 6 shows the comparison between the average age estimate for both real and simulated faces in each age group. The estimated ages of the simulations were very similar to the estimated ages of real images for the same age group, with slightly larger differences seen toward the larger simulated age deltas, older and younger (a difference of +1.7 and –2.7 years for the –20 and +30 year simulations, respectively, versus –1.2 for –10 years, +1.5 for +10 years, and –1.4 for +20 years).
These results demonstrate that the simulated facial appearances are both accurate and have a high degree of realism, as age estimates correspond closely to those returned for native, un-simulated faces by an AI algorithm trained on a very large, diverse set of native, unprocessed human facial images with inherently natural combinations of shape, topography, and coloration.
In contrast to previous attempts to model and simulate facial aging, this current approach is based entirely on a large quantity of accurate data (high-resolution whole-face 3D data, captured from 1250 subjects) with a wide dynamic range (subjects 10–80 years of age). Moreover, whereas previous attempts have applied a “one size fits all” approach to ethnicity, this current approach uses data acquired within specific ethnic groups (five major ethnic groups, comprising 250 subjects each), yielding separate robust models and simulations for each.
Further notable improvements in this current approach include the separate modeling of the three vectors that drive appearance (shape, color, and topography) and continuous output.
Considered together, therefore, the data modeling and simulation approach described here provides a very high level of confidence in both the accuracy and realism of derived simulations. “Accuracy” (i.e., the degree to which a measurement, calculation, or specification conforms to the correct value) is critical in a true, quantitative modeling and simulation approach such as this. However, in terms of the presentation of simulated facial aging models to a human audience, “realism” is equally important. The “Uncanny Valley” is a concept first hypothesized by Professor Masahiro Mori in 1970,18 describing the relationship between the degree of an object’s resemblance to a human being and one’s emotional response to such an object. He hypothesized that humanoid renderings or objects which imperfectly resemble living human beings provoke “uncanny” feelings of eeriness and revulsion in human viewers. The majority of attempts to model facial aging result in output which violates the Uncanny Valley principle. This is often because, for example, a global aging “filter” is applied to the skin of a subject’s face, but no attempt is made to include a change in shape characteristics, which represent critical visual cues of age, health, and attractiveness.
For example, in this study, we observed consistent, progressive, and cumulative change in a variety of shape-related endpoints, across all ethnicities, including (among others) (1) a significant (P < 0.05) widening of the lower face; (2) a significant (P < 0.05) thinning and widening of the lips; (3) a significant (P < 0.05) increase in the length of the philtrum (the vertical indentation in the middle area of the upper lip, extending in humans from the nasal septum to the tubercle of the upper lip). Full analyses of quantitative changes in these endpoints from this article will be reported elsewhere.
It should be obvious that omission of these important shape-related changes in facial age simulations will result in output which is both inaccurate and unacceptable to human viewers. We may take the observed widening of the lower face as an example here. The human face displays clear sexual dimorphism in shape (from exposure to sex hormones in utero or during puberty), with males presenting more masculine “robust” features and females, more feminine “gracile” features.4 Importantly, a gracile lower face is thinner and more pointed, whereas a robust lower face is wider and squarer. As we have quantified in this present study, as the female lower face becomes wider and less pointed, it starts to present visual cues related more to male, rather than female, sexual dimorphic characteristics. If this profound insight is omitted or modeled incorrectly when simulating facial aging, there are serious consequences in both the accuracy and realism of the final output. This principle applies to other characteristics also.
Finally, whereas the current modeling and simulation relates to change as the result of both intrinsic aging (chronological) and extrinsic aging (the overlay of incremental effects as the result of environment, lifestyle, etc.), it can be foreseen how different aging “trajectories” could be modeled and simulated with different input (eg, the effect of chronic exposure to solar ultraviolet radiation).
The subject provided written consent for the use of her images.
1. Population Division of the United Nations Department of Economic and Social Affairs. World population ageing 2020 highlights. Available at https://www.un.org/development/desa/pd/
. Accessed May 8, 2020.
2. Matts PJ, Fink B. Chronic sun damage and the perception of age, health and attractiveness. Photochem Photobiol Sci. 2010;9:421–431.
3. Foo YZ, Simmons LW, Rhodes G. Predictors of facial attractiveness and health in humans. Sci Rep. 2017;7:39731.
4. Rhodes G. The evolutionary psychology of facial beauty. Annu Rev Psychol. 2006;57:199–226.
5. Fink B, Grammar K, Matts PJ. Visible skin colour distribution plays a major role in the perception of age, attractiveness and health in female faces. Evol Human Behav. 2006;27:433–442.
6. Samson N, Fink B, Matts PJ. Interaction of skin colour distribution and skin surface topography cues in the perception of female facial age and health. J Cosmet Dermatol. 2011;10:78–84.
7. Samson N, Fink B, Matts PJ. Visible skin condition and perception of human facial appearance. Int J Cosmet Sci. 2010;32:167–184.
8. Evison MP, Iwamura ESM, Guimarães MAG. (2016). Forensic facial reconstruction and its contribution to identification in missing person cases. Morewitz SJ, Sturdy Colls C (eds.), In: Handbook of Missing Persons. New York: Springer, 2016;427–441.
9. Fu Y, Guo G, Huang TS. Age synthesis and estimation via faces: a survey. IEEE Trans Pattern Anal Mach Intell. 2010;32:1955–1976.
10. Rhodes G, Tremewan T. Averageness, exaggeration, and facial attractiveness. Psychol Sci. 1996;7:105–110.
11. Suo JL, Min F, Zhu S, Shan S, Chen X. A multi-resolution dynamic model for face aging simulation. Proceedings/CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007:1–7.
12. Yang H, Huang D, Wang Y, Jain AK. Learning face age progression: a pyramid architecture of GANs. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018;31–39.
13. D’Alessandro BM, Matts PJ, inventors. Methods and apparatuses for age appearance simulation. US patent 10,614,623. April 7, 2020.
14. Matts PJ, D’Alessandro BM, inventors. Methods for age appearance simulation. US patent 10621771. April 14, 2020.
15. Gower JC. Generalized procrustes analysis. Psychometrika. 1975;40:33–51.
16. Tiddeman B, Burt M, Perrett D. Prototyping and transforming facial textures for perception research. Computer Graphics and Applications, IEEE. 2001; 21:42–50.
17. Microsoft Corp. Microsoft Azure Cognitive Services Face API. Available at https://azure.microsoft.com/en-us/services/cognitive-services/face/
. Accessed May 10, 2021.
18. Mori M. The uncanny valley. Energy. 1970;7:33–35.