Deep learning–guided postoperative pain assessment in children : PAIN

Journal Logo

Research Paper

Deep learning–guided postoperative pain assessment in children

Fang, Jihonga; Wu, Weib; Liu, Jiaweib; Zhang, Sichenga,*

Author Information
PAIN 164(9):p 2029-2035, September 2023. | DOI: 10.1097/j.pain.0000000000002900

Current automated pain assessment methods only focus on infants or youth. They are less practical because the children who suffer from postoperative pain in clinical scenarios are in a wider range of ages. In this article, we present a large-scale Clinical Pain Expression of Children (CPEC) dataset for postoperative pain assessment in children. It contains 4104 preoperative videos and 4865 postoperative videos of 4104 children (from 0 to 14 years of age), which are collected from January 2020 to December 2020 in Anhui Provincial Children's Hospital. Moreover, inspired by the dramatic successful applications of deep learning in medical image analysis and emotion recognition, we develop a novel deep learning–based framework to automatically assess postoperative pain according to the facial expression of children, namely Children Pain Assessment Neural Network (CPANN). We train and evaluate the CPANN with the CPEC dataset. The performance of the framework is measured by accuracy and macro-F1 score metrics. The CPANN achieves 82.1% accuracy and 73.9% macro-F1 score on the testing set of CPEC. The CPANN is faster, more convenient, and more objective compared with using pain scales according to the specific type of pain or children's condition. This study demonstrates the effectiveness of deep learning–based method for automated pain assessment in children.

1. Introduction

Pain is a major vital sign following body temperature, breathing, pulse, and blood sugar. Seventy-seven percent of children suffer from different levels of pain during their hospitalization, among whom 23% to 40% undergo severe pain.23 Up to 60% of hospitalized school-age children suffer from pain, and the incidence of postoperative pain in infants and young children is 61%.22 The pain of children tends to be underestimated. Nearly 50% of hospitalized children have severe pain that cannot be effectively controlled.3 Once the pain is not well treated, it may lead to several complications, such as postoperative anxiety, disturbed sleep, increased catabolism, and hemodynamic instability.24 Thus, a quick and accurate pain assessment method for children of all ages is urgent and indispensable.

Currently, caregivers mainly assess the pain in children with scales,12,20,29 which require the caregivers to observe the behavioral and physiological changes of children or the children to provide self-report of pain. However, the caregivers' cognitive bias, identity,21 culture,28 and sex1 may result in inconsistent pain assessment. Moreover, it is complex and time-consuming for caregivers to grasp different pain assessment scales and apply them in specific scenarios.35 Self-report of pain intensity is the current clinical gold standard and the most used method of assessing clinical pain in children. But self-report requires substantial cognitive, linguistic, and social competencies.27 Unfortunately, some patients cannot provide a self-report of pain verbally.10

In the past few years, automated pain assessment has drawn increasing attention.13,19,33 It can quickly analyze the behavior and continuously monitor the pain of children. Facial expression is a sensitive biomarker that can indicate the severity of pain.31 Therefore, most of the existing works4,7,34 achieve automated pain assessment according to the facial expression of children. However, existing methods mainly focus on automated pain assessment in children of a specific age, eg, infants (younger than 1 year)7,34 or youth.26 In clinic scenarios, children who suffer from postoperative pain are of all ages.3,22 The approaches, which only focus on parts of the children, are less practical and less efficient. In this study, we focus on a more practical task: assessment of pain in children of all ages. This task is more challenging than automatic pain assessment in infants or youth owing to dataset missing, inconstant individual representations, and large interindividual divergence.

Considering that, we build a large-scale dataset named Clinical Pain Expression of Children (CPEC). It contains preoperative and postoperative videos of 4104 identities (from 0 to 14 years of age) in clinical scenarios. Furthermore, we propose a deep learning–based framework, which is called Children Pain Assessment Neural Network (CPANN). The object of this framework is to extract robust pain feature representations from the facial expression of children and achieve quick and reliable postoperative pain assessment in children of all ages.

2. Method

The research protocol is approved by the Biomedical Ethics Committee of Anhui Provincial Children's Hospital (approval no. 20190022). In this study, we build the CPEC dataset for postoperative pain assessment in children of all ages. And we further propose a novel deep learning–based framework to automatically achieve pain assessment for children on the CPEC dataset.

2.1. Clinical pain expression of children dataset

2.1.1. Participants

A total of 4572 children, aged 1 month to 14 years, are recruited in the Anhui Provincial Children's Hospital from January 2020 to December 2020 as a part of the study on postoperative pain assessment in children. These children undergo at least one clinical treatment in Anhui Provincial Children's Hospital, including debridement and suture, dressing, fracture, reduction, intravenous indwelling needle puncture, intramuscular injection, skin test, skin traction, incision and drainage, and removal of nail. Children with facial abnormalities are excluded as shown in Figure 1. The parents of all enrolled children sign the informed consent forms before recording the videos.

Figure 1.:
Data processing flowchart. Flow diagram of the collection and processing of the CPEC dataset. CPEC, Clinical Pain Expression of Children.

2.1.2. Data collection

We use a SONY FDR-AX60 digital 4K camcorder to record the facial expression of the children. All videos are recorded in the normal clinical environment. Concretely, for children who undergo surgery, the videos are recorded during 2 periods: (1) 8 to 9 am on the day of surgery and (2) 6 hours after surgery, which represent preoperative and postoperative data, respectively. For others, we record their facial expressions before the clinical treatments as preoperative data. The postoperative data are collected immediately after the treatments. If the pain intensity has clear changes (more severe or milder) in 3 minutes, we record one more postoperative video to enrich our dataset. Each video data are about 15 to 20 seconds in length. During the shooting process, we adjust the position of the camera after the facial movement of the children, and the distance between the lens and the face is also adjusted appropriately to reduce the noise caused by the photographer and children.

2.1.3. Data annotation

We assume that all the children do not suffer from any pain caused by the treatment in the preoperative period. Therefore, we label the preoperative data with −1 to distinguish them from the no pain data in the postoperative videos. Then, we annotate the postoperative video data at coarse-grained and fine-grained levels with 2 steps.

First, we employ 2 nursing staff: one staff provides a coarse-grained pain assessment (no pain = 0, mild pain = 1, moderate pain = 2, and severe pain = 3) for each child, whereas the other staff records the corresponding video data.

Second, 2 additional experienced nursing experts carefully annotate each postoperative video with the manual pain assessment scale and children's condition to generate a fine-grained pain label. Specifically, we adopt the FLACC scale29 for children younger than 7 years. The FLACC scale is scored in a range of 0 to 10, with 0 representing “no pain.” This scale contains 5 criteria: face, leg, action, cry, and consolability. Each of them is assigned a score of 0, 1, or 2, which results in the final score. For children older than 7 years, we label the video data according to the self-report of the children guided by the FPS-R scale.11 The FPS-R is a self-report measure of pain. It consists of 6 drawings of faces depicting how much something hurts, where 0 equals “no pain” and 10 equals “very much pain.” The annotations obtained at this step are credited as fine-grained labels.

2.1.4. Data exclusion

Even though the coarse- and fine-grained labels are obtained in different ways, we assume that there is a hierarchical relationship between them: a fine-grained label belongs to a coarse-grained label category. We then set17 a fine-grained score of 0 as no pain, 1 to 3 as mild pain, 4 to 6 as moderate pain, and 7 to 10 as severe pain and remove the samples that have unmatched labels at coarse-grained and fine-grained levels. A total of 259 postoperative videos of 244 participants are firstly excluded at this step. Moreover, we organize an expert committee, composed of the project manager and 2 pediatric deputy chief nurses, who have more than 15 years of pediatric clinical work experience. All the videos and pain annotations are carefully reviewed by the expert committee, and the instances of disagreement are also excluded. The object of this process is to reduce the influence of self-report and observer-report bias on the reliability of the labels. A total of 173 postoperative videos of 171 participants are further excluded at this process.

2.1.5. Data processing

The usage of raw videos has several limitations. First, the size of the raw videos is too large (3840 × 2160), which is unfriendly for data storage and data processing. Second, the raw videos contain abundant frames. Third, there is much background noise in the raw videos, even though the photographer adjusts the position and direction of the camera carefully. Therefore, we remove every other frame from each video to reduce data redundancy. All frames are resized to 640 × 360 by using the bicubic interpolation method. Moreover, we apply Dual Shot Face Detector (DSFD)15 on each frame to detect the children's faces with the bounding boxes. We then crop the frames according to the bounding boxes to obtain compact facial regions and further eliminate the background noise in the videos.

The overall collection process is illustrated in Figure 1. The final dataset contains 8969 videos of 4104 children. The dataset is randomly divided into a training set and a testing set, which contain 3324 and 780 identities, respectively. Specifically, the training set is used to optimize the automated pain assessment framework. The testing set is used to evaluate the effectiveness and accuracy of the trained framework. Considering that the scale of the dataset is large, we fix the partitioning of the training or testing set. The CPEC dataset is quite challenging for several reasons: (1) it contains a group of children with large interindividual divergence; (2) all data are collected before and after the treatment in the real clinical environment; and (3) the identities in the training and testing sets are nonoverlapping.

2.2. Pain assessment framework

We propose a deep learning–based pain assessment framework, namely CPANN. It is designed and trained to estimate postoperative pain scores based on the facial expression of children in preoperative and postoperative videos. In this section, we will introduce the architecture of the CPANN in detail.

2.2.1. Network architecture

The overall architecture of CPANN is shown in Figure 2. It mainly consists of 5 modules: a Convolutional Neural Network (CNN) backbone, 2 semantic attention modules (SAMs), a disparity attention module (DAM), and a select block. The 2 SAMs share parameters. The detailed structure is provided in the supplemental file (available as supplemental digital content at

Figure 2.:
Architecture of the CPANN. The overall architecture of the proposed CPANN, it mainly consists of a CNN backbone, 2 semantic attention modules (SAMs), a disparity attention module (DAM), and a select block. CNN, convolutional neural network; CPANN, Children Pain Assessment Neural Network.

2.2.2. Implementation details

The training and testing procedure of CPANN is accomplished with the PyTorch framework (Facebook AI Research, Menlo Park, CA). The CNN backbone based on MobileNet V225 is pretrained from the ImageNet dataset, and the BiSeNet32 in the semantic attention modules is pretrained on the CelebAMask-HQ dataset.14 In this study, we use the training set and testing set of the CPEC dataset to train and evaluate the CPANN. Because we take advantage of 2 kinds of manual pain assessment tools to generate pain labels in the CPEC dataset, the number of classification layers in the select block is set as 2 for training the corresponding samples labeled by each manual pain assessment tool (FLACC scale or FPS-R scale). In addition, we only use the coarse-grained pain labels in the training and testing procedure, whereas the fine-grained pain labels are left out because the fine-grained pain data in the current dataset are heavily unbalanced, which means the quantity of data for each class is heavily unbalanced (Section 3.1 indicates that there are 17.7% data labeled as “1,” but only 0.1% data labeled as “10” in the entire dataset). This will result in an extremely imbalanced training process and poor predictive performance.

We randomly sample 6 frames from variable-length video sequences as the input data. All frames of the input video are resized to 224 × 224 pixels and are normalized with 1.0/255. The data augmentation during the training stage includes random horizontal flipping and random rotation. Note that we do not apply any data augmentation method during the testing stage. We use Adam optimizer for optimizing. The CPANN is trained for 30,000 iterations, and each batch contains the postoperative and preoperative video pair of 4 identities. The initial learning rate is set as 0.0003, and the cosine annealing schedule16 is adopted from 15,000 iterations. The CPANN is trained for around 8 hours on the Ubuntu workstation (Canonical, London, United Kingdom) with an NVIDIA GeForce 1080 Ti GPU (Santa Clara, CA).

2.3. Statistical analysis

The statistical analysis of the dataset is performed with Python v. 3.7 (Facebook AI Research, Menlo Park, CA). Percentages and quantities are adopted to describe the children's characteristics in the CPEC dataset. We adopt accuracy, F1 score, and macro-F1 score metrics to report the pain assessment performance of the proposed CPANN. Concretely, the accuracy metric measures the percentage of correct pain predictions in the total number of predictions. The F1 score metric is generally used to evaluate binary classification systems. It combines the precision and recall (sensitivity) of a classifier into a single metric by taking their harmonic mean. The precision is the fraction of true-positive examples among the examples that the model classified as positive. The recall, also known as sensitivity, is a metric that quantifies the number of true-positive predictions made out of all true samples. In the case of multi-class classification, there are several averaging methods for the F1 score, resulting in a set of different scores (macro-F1 score, weighted F1 score, micro-F1 score) in the classification report. Here, we adopt the macro-F1 score, which gives the same importance to each class. We first calculate the F1 score for each class separately. Then the macro-F1 score is computed using the arithmetic mean of all the per-class F1 score.

3. Results

3.1. Children characteristics

There are 4104 children involved in the CPEC dataset, including 1524 girls and 2580 boys with a median age of 48 months. The entire dataset is randomly split into a training set and a testing set. We present the clinical characteristics of children in the training set, testing set, and entire dataset in Table 1. This proves the rationality of the split, and the percentages of all characteristics in the training and testing sets are similar to those of the entire dataset. Approximately 91.7% of the videos are obtained with children who suffer pain caused by an intravenous indwelling needle in the entire set. In addition, the labels of fine-grained pain generated by FLACC are approximately 80.9% of the total labels.

Table 1 - Clinical characteristics of the children.
Characteristic Entire set (n = 4104) Train set (n = 3324) Testing set (n = 780)
No. of video data 8969 7330 1639
Sex, F:M 1524:2580 1224:2100 300:480
Median age (mo) 48 48 48
Age, n (%)
 0-1 year 522 (12.7) 426 (12.8) 96 (12.3)
 1-3 years 1133 (27.6) 914 (27.5) 219 (28.1)
 3-6 years 1305 (31.8) 1055 (31.7) 250 (32.1)
 ≥6 years 1144 (27.9) 929 (27.9) 215 (27.6)
Treatment, n (%)
 Debridement and suture 4 (0.1) 3 (0.1) 1 (0.1)
 Dressing 13 (0.3) 10 (0.3) 3 (0.4)
 Fracture reduction 16 (0.4) 14 (0.4) 2 (0.3)
 Intravenous indwelling needle puncture 3764 (91.7) 3042 (91.5) 722 (92.6)
 Intramuscular injection 165 (4.0) 134 (4.0) 31 (4.0)
 Skin test 110 (2.7) 93 (2.8) 17 (2.2)
 Skin traction 2 (0.0) 2 (0.1) 0 (0.0)
 Incision and drainage 1 (0.0) 1 (0.0) 0 (0.0)
 Remove of nail 29 (0.7) 25 (0.8) 4 (0.5)
Type of scale, n (%)
 FLACC 3321 (80.9) 2691 (81.0) 630 (80.8)
 FPS-R 783 (19.1) 633 (19.0) 150 (19.2)
FLACC: Face, Legs, Activity, Cry, Consolability scale. FPS-R: Faces Pain Scale

Because there is just one preoperative video data for each child in the CPEC dataset, we only analyze the pain characteristics in postoperative videos, as illustrated in Figure 3. The horizontal axis in each subfigure denotes the pain level, whereas the vertical axis represents the percentage of the quantity. The pain characteristics in the training and testing set are similar to those in the entire dataset on fine-grained and coarse-grained levels. For fine-grained labels, the pain included grades from “0” to “10.” The mean and median pain levels for the entire set, training set, and testing set are 3.3 and 3, respectively. Among the entire dataset, 17.7% of postoperative video data are labeled as 2, whereas only 0.2% videos are labeled as 10. For coarse-grained labels, the pain included grades from “0” to “3.” The mean and median pain levels for the entire set, training set, and testing set are 1.5 and 1, respectively. The majority of the pain video data (51.8%) in the entire dataset are labeled as 1 (mild pain). The data of label 0 (no pain) are the lowest, with 7.0% of the samples.

Figure 3.:
Statistics of pain characteristics. Pain characteristics of the CPEC dataset. The first row demonstrates the pain characteristics on fine-grained level in the entire set (A), training set (B), and testing set (C); the second row shows the pain characteristics on coarse-grained level in the entire set (D), training set (E), and testing set (F). For coarse-grained pain labels, 0, 1, 2, and 3 indicate no pain, mild pain, moderate pain and severe pain, respectively. CPEC, Clinical Pain Expression of Children.

3.2. Experimental results

The testing set of CPEC contains 1639 videos of 780 identities, and there are 859 postoperative videos among them. We report the performance of the proposed CPANN on the CPEC dataset in Table 2. The CPANN achieves 82.1% accuracy and 73.9% macro F1 score on the total testing set. The performance of CPANN on F1 score of no pain (label = “0”), mild pain (label = “1”), moderate pain (label = “2”), and severe pain (label = “3”) data are 43.0%, 86.6%, 78.6%, and 87.2%, respectively. The disparity may result from the quantity imbalance between pain categories.

Table 2 - Performance of Children Pain Assessment Neural Network on the clinical pain expression of children dataset.
Age Accuracy (%) F1 score (%) Macro-F1 score (%)
0 1 2 3
 All 76.8 40.9 83.9 66.5 73.2 66.1
 0-1 year 72.5 0.0 70.3 71.1 76.3 54.4
 1-3 years 77.3 0.0 76.7 70.9 75.2 55.7
 3-6 years 78.1 51.4 85.5 63.3 68.9 67.3
 ≥6 years 76.1 0.0 87.1 35.7 57.1 45.0
 All 82.1 43.0 86.6 78.6 87.2 73.9
 0-1 year 79.4 0.0 71.8 78.7 86.5 59.3
 1-3 years 82.5 50.0 80.9 82.1 90.1 75.8
 3-6 years 83.2 51.2 88.6 77.3 85.2 75.6
 ≥6 years 81.4 0.0 88.8 56.0 75.0 55.0
CPANN, Children Pain Assessment Neural Network.

Moreover, we compare the performance of CPANN with the baseline model on the CPEC dataset in Table 2. In the baseline model, only one classification layer is used to assess the pain with the final features, which are obtained by concatenating the pain feature, age feature, and gender feature. The CPANN surpasses the baseline model by 5.3% accuracy and 7.8% macro-F1 score on the total testing set. Additionally, for each age group, the CPANN achieves better accuracy and macro-F1 score as compared with the baseline model. These experimental results prove the effectiveness of the CPANN.

4. Discussion

Currently, self-report of children is considered as most accurate and reliable measure of pain because pain is an internal phenomenon.18 Researchers review numerous self-report scales and demonstrate the effectiveness of some self-report scales for acute pain in children older than 6 years.2 However, self-report pain in children may be hindered both by developmental factors and by the way certain features of pain assessment tools.8 Besides, existing self-report scales are not well-established for all children: there is no recommended self-report scale for children younger than 6 years.2 An accurate and fast automatic pain assessment platform for children of all ages may complement current assessment methods.30 It can provide reliable preliminary pain assessment to help nurses quickly find children suffering from severe pain, which makes sure these children can receive pain management as a high priority. Moreover, in contrast to traditional assessment tools, they could monitor pain continuously, which is more efficient and practical in clinical scenarios. For example, during a patient's postoperative hospitalization, the automated pain assessment platform can achieve real-time online monitoring of the pain of the patient to improve clinical outcomes and reduce the workload of nurses. Moreover, in some low-income countries where medical resources are scarce, the medical staff has fewer opportunities to be trained for pain assessment. Under this circumstance, an accurate automatic pain assessment method is more feasible than manual pain assessment scales.

Automated pain assessment methods can be mainly divided into 2 categories: hand-crafted–based methods and deep learning–based methods. The former uses predesigned algorithms to extract hand-crafted features from the data. For example, Brahnam et al.4 proposed the first hand-crafted–based methods, which use principal component analysis and linear discriminant analysis to learn the pain representations and use distance-based classifiers and support vector machine to classify the data into pain or no pain. However, feature representations of these methods are limited because the algorithms are designed subjectively by humans. The latter generally uses CNNs to automatically mine deep discriminative features of pain. Most of the existing deep learning–based pain assessment methods focus on infants (younger than 1 year). Celona et al.7 used pretrained convolutional neural network architecture to extract deep feature representations from static images of infants for classifying them into pain or no-pain images. Zamzmi et al.34 proposed a Neonatal Convolutional Neural Network to achieve end-to-end neonatal pain assessment. These studies suggest that automatic pain assessment in neonates is viable and more efficient than the current standard. However, these methods only estimate whether the infant has pain, ignoring the pain intensity. For children older than 1 year, there are few studies on deep learning–based automatic pain assessment methods. Sikka et al.26 proposed a model based on computer vision to measure the pain intensity according to the facial expression of 50 neurotypical youth aged 5 to 18 years. This study demonstrates the preliminary effectiveness of the algorithms based on computer vision for automatic pain assessment in children older than 5 years. However, all the aforementioned methods only focus on a specific age group of children, ie, infants7,34 or youth.26 In the clinical scenario, children who suffer from postoperative pain are in a broader age range. Thus, the existing automated pain assessment approaches built for a specific age group are less practical. Moreover, deep learning is a data-driven approach, which means that the training procedure of a deep learning–based framework fairly relies on the scale of the dataset. Existing children pain assessment–related, publicly-available datasets are mainly built for infants.5,9 These datasets are limited in scale: they only contain approximately 100 videos or images. The algorithms that fully use data richness are less likely to be developed with these datasets. Furthermore, most of the datasets are collected in experimental environments, which lack samples in real clinical scenarios.

In this work, we propose a large-scale dataset, namely the CPEC dataset, which covers children from age 1 month to 14 years. Based on the CPEC dataset, we propose a novel CPANN to assess postoperative pain in children. It achieves 82.1% accuracy and 73.9% macro-F1 score on the test set of CPEC. To analyze the performance of CPANN on different age groups of children, we further divide the dataset into 4 groups according to age: 1 month to 1 year, 1 to 3 years, 3 to 6 years, and older than 6 years. For infants (younger than 1 year), the accuracy and Macro-F1 score of CPANN is 79.4% and 59.3%, respectively. In the group of children aged between 1 and 3 years, the accuracy and Macro-F1 score are 82.5% and 75.8%, respectively. For preschool-age children (aged 3-6 years), the accuracy and macro-F1score are 83.2% and 75.6%, respectively. For school-age children (older than 6 years), the accuracy and Macro-F1 score are 81.4% and 55.0%, respectively. The CPANN achieves the highest accuracy with the group of children aged 3 to 6 years, whereas the best macro-F1 score is obtained with the group of children aged 1 to 3 years. The disparity in the performance of CPANN among different age groups of children may be caused by several reasons. First, the number of enrolled infants (aged 1 month to 1 year) in the CPEC dataset is only 522, which is much fewer than in other age groups. Second, the individual differences among infants are larger than other age groups; thus, the pain assessment for them is more difficult than for others.6 Moreover, we find that the CPANN achieves a better F1 score on significant pain (moderate to severe) in children in the age group of 1 month to 1 year and 1 to 3 years. We deem the change in facial expression of these groups of children to be more significant when they suffer from significant pain, which means it is easier for CPANN to mine pain representation from the videos for classification. For children in the age group of 3 to 6 years and older than 6 years, the best F1 scores are obtained when detecting mild pain. We think this is because the quantity of data labeled as mild pain is much more than other classes in the CPEC dataset, which makes the CPANN to have a better ability to recognize mild pain.

The performance of the CPANN demonstrates the feasibility of automatic pain assessment for children in a broader age range. The CPANN is advantageous in both speed and convenience. The time taken by CPANN to process a video clip is 0.07 seconds, which is undoubtedly much shorter than applying manual scales. Besides, CPANN is beneficial to the standardization of pain assessment across clinics and geographic areas. Children Pain Assessment Neural Network does not require specific knowledge of pain assessment scales for usage and can provide consistent pain estimations, which allows it to be used in different scenarios. Children Pain Assessment Neural Network is also conducive to enhancing hospital pain management initiatives. It can be used to evaluate and track patient care in real-time, helping caregivers implement timely pain management in case of emergency.

This study has several limitations that also need to be considered. First, the performance of the CPANN still requires to be improved, and we will continue to study it. Second, there exists a heavy data imbalance among fine-grained pain labels in the CPEC dataset. Additionally, the number of postoperative and preoperative data for infants (0-1 year) is limited compared with other age groups. Third, the annotations of pain levels are generated by different pain assessment scales. Finally, the CPEC dataset is a single modality dataset. The deep learning framework can only extract visual features from the CPEC dataset. However, sound information and physiological indexes of children can also provide reliable clues of pain assessment. Single modality dataset may limit the potential of deep learning–guided pain assessment frameworks and make pain assessment on children more challenging. Thus, we plan to develop and expand our CPEC dataset, as appropriate, in future studies. In addition, the approach warrants further investigation with other kinds of clinical pain and in other patient populations, eg, adults.

In conclusion, this study presents a novel large-scale CPEC dataset for postoperative pain assessment in children, which meets the problems resulting from dataset missing and provides a solid foundation for developing an automated assessment of pain in children. This demonstrates the feasibility of using a deep learning algorithm as a quick and objective pain assessment method for children.

Conflict of interest statement

The authors have no conflicts of interest to declare.

Appendix A. Supplemental digital content

Supplemental digital content associated with this article can be found online at


The authors thank all children and their parents who participated in the study. They are also thankful to all nursing researchers and experts who participated in the data collection and annotation process of the CPEC dataset. This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 67976008 and U19A2057, the Fundamental Research Funds for the Central Universities under Grant WK2100000021, and the Natural Science Foundation of Anhui Province under Grants 2208085QH241. The CPEC dataset and project code will be available at


[1]. Adde L, Helbostad JL, Jensenius AR, Taraldsen G, Støen R. Using computer-based video analysis in the study of fidgety movements. Early Hum Dev 2009;85:541–7.
[2]. Birnie KA, Hundert AS, Lalloo C, Nguyen C, Stinson JN. Recommendations for selection of self-report pain intensity measures in children and adolescents: a systematic review and quality assessment of measurement properties. PAIN 2019;160:5–18.
[3]. Brudvik C, Moutte SD, Baste V, Morken T. A comparison of pain assessment by physicians, parents and children in an outpatient setting. Emerg Med J 2017;34:138–44.
[4]. Brahnam S, Chuang CF, Sexton RS, Shih FY. Machine assessment of neonatal facial expressions of acute pain. Decis Support Syst 2007;43:1242–54.
[5]. Brahnam S, Nanni L, Sexton R. Introduction to neonatal facial pain detection using common and advanced face classification techniques. Advanced computational intelligence paradigms in healthcare–1. Berlin, Heidelberg: Springer, 2007. p. 225–53.
[6]. Brasher C, Gafsous B, Dugue S, Thiollier A, Kinderf J, Nivoche Y, Grace R, Dahmani S. Postoperative pain management in children and infants: an update. Pediatr Drugs 2014;16:129–40.
[7]. Celona L, Manoni L. Neonatal facial pain assessment combining hand-crafted and deep features. International conference on image analysis and processing. Cham: Springer, 2017. p. 197–204.
[8]. Freund D, Bolick BN. CE: assessing a child's pain. Am J Nurs 2019;119:34–41.
[9]. Harrison D, Sampson M, Reszel J, Abdulla K, Barrowman N, Cumber J, Fuller A, Li C, Nicholls S, Pound CM. Too many crying babies: a systematic review of pain management practices during immunizations on YouTube. BMC Pediatr 2014;14:134–8.
[10]. Herr K, Coyne PJ, Ely E, Gélinas C, Manworren RC. Pain assessment in the patient unable to self-report: clinical practice recommendations in support of the ASPMN 2019 position statement. Pain Manag Nurs 2019;20:404–17.
[11]. Hicks CL, von Baeyer CL, Spafford PA, van Korlaar I, Goodenough B. The Faces Pain Scale–Revised: toward a common metric in pediatric pain measurement. PAIN 2001;93:173–83.
[12]. Hudson-Barr D, Capper-Michel B, Lambert S, Mizell Palermo T, Morbeto K, Lombardo S. Validation of the pain assessment in neonates (PAIN) scale with the neonatal infant pain scale (NIPS). Neonatal Netw 2002;21:15–21.
[13]. Kaltwang S, Rudovic O, Pantic M. Continuous pain intensity estimation from facial expressions. International symposium on visual computing. Berlin, Heidelberg: Springer, 2012. p. 368–77.
[14]. Lee CH, Liu Z, Wu L, Luo P. Maskgan: towards diverse and interactive facial image manipulation. Proc IEEE Conf Comput Vis Pattern Recogn 2020:5549–58.
[15]. Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F. DSFD: dual shot face detector. Proc IEEE Conf Comput Vis Pattern Recogn 2019:5060–9.
[16]. Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts, 2016. arXiv preprint arXiv:1608.03983.
[17]. Malviya S, Voepel‐Lewis T, Burke C, Merkel S, Tait AR. The revised FLACC observational pain tool: improved reliability and validity for pain assessment in children with cognitive impairment. Pediatr Anesth 2006;16:258–65.
[18]. Max MB, Payne R, Edwards WT, Sunshine A, Inturrisi CE, Miser A. Principles of analgesic use in the treatment of acute pain and cancer pain. Glenview, IL: American Pain Society, 1999. p. 1–45.
[19]. Pai CY. Automatic pain assessment from infants' crying sounds. University of South Florida, 2016.
[20]. Peters JWB, Koot HM, Grunau RE, de Boer J, van Druenen MJ, Tibboel D, Duivenvoorden HJ. Neonatal facial coding system for assessing postoperative pain in infants: item reduction is valid and feasible. Clin J Pain 2003;19:353–63.
[21]. Pillai Riddell RR, Badali MA, Craig KD. Parental judgments of infant pain: importance of perceived cognitive abilities, behavioural cues and contextual cues. Pain Res Manag 2004;9:73–80.
[22]. Relland LM, Gehred A, Maitre NL. Behavioral and physiological signs for pain assessment in preterm and term neonates during a nociception-specific response: a systematic review. Pediatr Neurol 2019;90:13–23.
[23]. Roofthooft DWE, Simons SHP, Anand KJS, Tibboel D, van Dijk M. Eight years later, are we still hurting newborn infants? Neonatology 2014;105:218–26.
[24]. Shum S, Lim J, Page T, Lamb E, Gow J, Ansermino JM, Lauder G. An audit of pain management following pediatric day surgery at British Columbia Children's Hospital. Pain Res Manag 2012;17:328–34.
[25]. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. Mobilenetv2: inverted residuals and linear bottlenecks. Proc IEEE Conf Comput Vis Pattern Recogn 2018:4510–20.
[26]. Sikka K, Ahmed AA, Diaz D, Goodwin MS, Craig KD, Bartlett MS, Huang JS. Automated assessment of children's postoperative pain using computer vision. Pediatrics 2015;136:e124–31.
[27]. Susam BT, Riek NT, Akcakaya M, Xu X, de Sa VR, Nezamfar H, Diaz D, Craig KD, Goodwin MS, Huang JS. Automated pain assessment in children using electrodermal activity and video data fusion via machine learning. IEEE Trans Biomed Eng 2022;69:422–31.
[28]. Varallyay GJ, Benyó Z, Illényi A, Farkas Z, Kovács L. Acoustic analysis of the infant cry: classical and new methods. In: The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol. 1. San Francisco, CA: IEEE, 2004. p. 313–6.
[29]. Voepel-Lewis T, Shayevitz JR, Malviya S. The FLACC: a behavioral scale for scoring postoperative pain in young children. Pediatr Nurs 1997;23:293–7.
[30]. Werner P, Lopez-Martinez D, Walter S, Al-Hamadi A, Gruss S, Picard RW. Automatic recognition methods supporting pain assessment: a survey. IEEE Trans Affect Comput 2019:123.
[31]. Williams ACdC. Facial expression of pain: an evolutionary account. Behav Brain Sci 2002;25:439–55; discussion 455–88.
[32]. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N. Bisenet: bilateral segmentation network for real-time semantic segmentation. Proc Eur Conf Comput Vis 2018:325–41.
[33]. Zamzami G, Ruiz G, Goldgof D, Kasturi R, Sun Y, Ashmeade T. Pain assessment in infants: towards spotting pain expression based on infants' facial strain. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. IEEE 2015;5:1–5.
[34]. Zamzmi G, Paul R, Goldgof D, Kasturi R, Sun Y. Pain assessment from facial expression: Neonatal convolutional neural network (N-CNN). 2019 International Joint Conference on Neural Networks (IJCNN). IEEE 2019:1–7.
[35]. Zamzmi G, Kasturi R, Goldgof D, Zhi R, Ashmeade T, Sun Y. A review of automated pain assessment in infants: features, classification tasks, and databases. IEEE Rev Biomed Eng 2018;11:77–96.

Pain assessment; Facial expression; Deep learning

Supplemental Digital Content

Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the International Association for the Study of Pain.