Secondary Logo

Is there agreement between evaluators that used two scoring systems to measure acute radiation dermatitis?

Fuzissaki, Marceila de Andrade, PhDa; Paiva, Carlos Eduardo, PhDb; Gozzo, Thais de Oliveira, PhDc; Maia, Marcelo de Almeida, PhDd; Canto, Paula Philbert Lajolo, PhDe; Maia, Yara Cristina de Paiva, PhDa,*

Section Editor(s): Ding., Jianxun

doi: 10.1097/MD.0000000000014917
Research Article: Observational Study

To analyze the agreement between the nurses evaluating radiodermatitis that used the Radiation Therapy Oncology Group (RTOG) and the World Health Organization (WHO) scales.

A prospective and longitudinal study conducted in 2016 to 2017, in a university hospital. We analyzed 855 images of irradiated sites of 100 breast cancer women during radiotherapy. In order to evaluate the agreement between 3 observers that evaluated theses irradiated sites Krippendorff's alpha and weighted kappa were obtained and analyzed.

The pairwise agreement among the evaluators was fair and moderate (RTOG scale: 0.408, 95% confidence interval, CI 0.370–0.431; WHO scale: 0.559, 95% CI 0.529–0.590). In addition, the general agreement rates were 10.2% and 29.2%, respectively. When assessing the overall absolute agreement between the evaluators according with different phototypes and types of surgery, there was a fair agreement according to the RTOG scale when evaluating patients with phototype V or VI and mastectomy (3.7% and 8.8%, respectively).

The RTOG and WHO scales should be used with caution in clinical practice to identify the prevalence of radiodermatitis and the severity. Another point of caution is that skin phototype and the type of surgery may influence the analysis outcome. An illustrative scale was designed and proposed, by our group, aiming to improve accuracy and agreement between evaluators that will be tested in subsequent clinical studies.

aGraduate Program in Health Sciences, Medicine School, Federal University of Uberlandia, Uberlandia, Minas Gerais

bDepartment of Clinical Oncology, Division of Breast and Gynecology, Barretos Cancer Hospital, Barretos-SP

cDepartment of Maternal-Child Nursing and Public Health, Ribeirão Preto College of Nursing, University of São Paulo

dFaculty of Computing

eDepartment of Clinical Oncology, Clinic's Hospital, Federal University of Uberlandia, Uberlandia, Minas Gerais, Brazil.

Correspondence: Yara Cristina de Paiva Maia, Avenida Pará, 1720. Bloco 2U, Campus Umuarama, Uberlandia, Minas Gerais, CEP 38400-902, Brazil (e-mail:

Abbreviations: AC = Adriamycin + cyclophosphamide, ACTH = Adriamycin + cyclophosphamide followed by paclitaxel and trastuzumab, BC = breast cancer, CI = confidence interval, CK = cytokeratin, CMF = cyclophosphamide, methotrexate, and 5-fluorouracil, EGFR = epidermal growth factor receptor, HER2 = human epidermal growth factor receptor 2, K = kappa, Ki 67 = antigen Ki 67, NR = not registered, RE = estrogen receptor, RP = progesterone receptor, RT = radiation therapy, RTOG = Radiation Therapy Oncology Group, SD = standard deviation, TC = cyclophosphamide and docetaxel, WHO = World Health Organization.

This work was supported by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and CAPES. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This study was approved by the Human Research Ethics Committee of Federal University of Uberlandia (CEP/UFU) under protocol number (1348706/15) and all participants signed a free and informed consent form.

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

The authors have full control over the primary data and agree to allow the journal to review the data if requested.

The authors have no conflicts of interest to disclose.

This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial License 4.0 (CCBY-NC), where it is permissible to download, share, remix, transform, and buildup the work provided it is properly cited. The work cannot be used commercially without permission from the journal.

Received September 17, 2018

Received in revised form January 11, 2019

Accepted February 26, 2019

Back to Top | Article Outline

1 Introduction

One of the most common adverse event due to external radiation therapy (RT) is radiodermatitis, also called radiation-induced skin reaction or cutaneous toxicity, affecting more than 95% of patients.[1] Such events may be associated with fatigue, body image and sleep disorders, emotional disturbances, as well as causing pain, stress, and impairment of quality of life.[2,3] Depending on the severity, radiodermatitis may also lead to postponement of treatment or limiting the administered dose.[3,4]

During clinical practice, accurate assessments and grading of radiodermatitis are essential for monitoring and documenting.[5] Existing tools/scales have a structured description and classification of severity,[5] allowing healthcare professionals to identify, evaluate, and grade this adverse event in order to manage appropriately, and to promote comfort and healing.[4] Traditionally, the severity of acute radiodermatitis is evaluated and graded subjectively by the practitioner using various clinical scales,[3] such as, the Common Terminology Criteria for Adverse Events, Radiation Therapy Oncology Group (RTOG), Late Effect on Normal Tissue, Symptom Objective Measures Management Assessment,[5] and the World Health Organization (WHO) criteria.[3] Although these tools are widely used, reliability and validation of data is scarce.[2,5]

The present study proposed to bridge the lack of related studies, and to highlight the importance of using such methods in clinical practice. The focus of our study is to analyze the agreement between evaluators that use the RTOG and WHO scales. We hypothesize that the agreement of both scales will not be very satisfactory.

Back to Top | Article Outline

2 Methods

2.1 Ethical aspects

A prospective study was conducted in a university hospital with breast cancer (BC) patients during RT, from April 2016 to June 2017. This study was approved by the Human Research Ethics Committee (protocol number: 1348706/15) and it was based on the standards of the Helsinki Declaration. All participants signed a free and informed consent form.

Back to Top | Article Outline

2.2 Eligibility criteria

The study included photos of women over the age of 18 years; of any race or color; who were diagnosed with BC that underwent RT at Uberlandia Cancer Hospital. Those women who presented ulceration, a wound or skin tumor in the treated area, systemic lupus erythematosus history, rheumatoid arthritis, ataxia telangiectasia, and other hereditary diseases involving the skin, and women with a history of previous RT or were in the process of RT were excluded from the study.

The sample calculation was based on the Cox regression, fixed models, with an expected effect size of 0.15, an alpha level of 0.05, 93% power using the G*Power software (Department of Psychology, Germany), version 3.0[6] and 100 women were required. Figure 1 depicts a flowchart of the selection of participants.

Figure 1

Figure 1

Back to Top | Article Outline

2.3 Measurements

Data were obtained by inspecting the active medical records of the patients and through a semi-structured interview, including questions related to sociodemographic, clinical, RT, and concomitant treatment characteristics. The RTOG and WHO scales were used to evaluate the degree of the acute skin reactions.

In 1985, the RTOG developed the Acute Radiation Morbidity Scoring Criteria to classify the effects of RT, including skin reactions. It is a scale that stands out due to its extensive use for more than 25 years and for the fact that it is accepted and recommended by the medical and nursing communities.[7] The WHO criteria scale was also chosen, since it is an easy-to-use scale, it has well defined and different descriptions from those in the RTOG scale; however, it does have the same scores (grade 0–4) (Fig. 2).

Figure 2

Figure 2

The irradiated site was evaluated by evaluator A over 9 weeks on average, being: before the RT, weekly during treatment, and after the end of treatment. After evaluator A evaluated the site and recorded the data in real time, the irradiated area was photographed (ph), using a Canon EOS Rebel: Fabricante: Canon (Tokyo, Japan) T5i 18 to 55 mm camera, with a resolution of 18 MP. The women were evaluated in the same room, with the same environmental conditions and clarity. The photos were taken with a view to record all possible sites for the occurrence of radiodermatitis. First, the patient's chest was photographed, encompassing both breasts. Afterwards, it was requested to raise the arm on the side that was being irradiated homolaterally, to capture the axillary region. Finally, the inframammary region was photographed, regardless of whether the patient had undergone total mastectomy or not. The evaluator stayed the same distance and position throughout all the photographic records. It should be emphasized that several photos were taken of each patient; however, care was dispensed to maintain their confidentiality. The quality of the photos was evaluated, and those that presented better quality, in terms of positioning, clarity, and focus were chosen.

Before the photo evaluations, the responsible researcher invited all 8 healthcare professionals (radiotherapists and nurses) from the RT sector to attend a meeting. In this meeting, the researchers presented aspects related to the evaluation and description of radiodermatitis, the scales available in the literature, and discussed cases based on photos of patients with different types of cancer, including BC. Of the 8 professionals, 3 nurses agreed to participate.

The recorded photos were independently evaluated by nurses A, B, and C, whose evaluations will be referred to as Aph, Bph, and Cph, respectively. Nurse Aph is the same one that evaluated the irradiated site in vivo. The 3 professionals did not have different roles and were experienced in the radiotherapy field (years of clinical practice; Aph = 6 years, Bph = 5 years, and Cph = 3 years). Each evaluator completed an instrument that had the RTOG scale and the WHO scale, the evaluation date, and identification number of the evaluator and the patient. The scale with the descriptions of the score was plastified and delivered together with the evaluation instrument, a Universal Serial Bus stick with all the photos recorded, the free and informed consent form plus a term of confidentiality and commitment according to the ethics commitment. The latter 2 were signed, 1 copy being kept by the evaluator and another by the researcher.

Back to Top | Article Outline

2.4 Treatment characteristics

Considering the data regarding RT, 95% were treated in the Varian linear accelerator, Clinac 600c model (Varian Medical Systems, EUA) and 5% in the linear accelerator brand Elekta (England, United Kingdom), Precise model. Simple planning was performed in 92% of patients and 8% in three dimensions. The energy of the radiation was 6 MeV in 98% of patients. The total dose to the chest wall was 50.22 (95% confidence interval, CI 50.06–50.38) and 93.8% of the patients undergoing breast conserving therapy received a sequential boost of 5 to 10 Gy. In patients’ postmastectomy radiotherapy, the total dose to the chest wall was 50.14 (95% CI 50.07–50.20) and 32.4% of them received a sequential boost of 9 to 16 Gy. The daily dose of 1.8 Gy in 36.9% and 35.1% and 2 Gy in 63.1% and 64.9% in patients undergoing breast conserving therapy and postmastectomy radiotherapy, respectively. Fourteen percent of patients needed to interrupt treatment due to radiodermatitis (the minimum = 7 days and maximum = 17 days). The mean treatment time, defined as the period in days between the start and end date of radiotherapy, including days without treatment and weekends was 36.9 days (95% CI 35.70–37.88).

Back to Top | Article Outline

2.5 Statistical analyses

In order to obtain the characterization of the study sample, a descriptive analysis was provided with the means, median, and frequency, according to the type of variable.

To evaluate the agreements and disagreements among the evaluators and their respective CI, we used Krippendorff's alpha ordinal test because there are more than 2 evaluators. The strength of agreement was as follows: <0.00 = poor, 0.00 to 0.20 = slight, 0.21 to 0.40 = fair, 0.41 to 0.60 = moderate, 0.61 to 0.80 = substantial, and 0.81 to 1.00 = nearly perfect.[8] The weighted kappa (K) was used to assess which were the pairwise peer reviewers with best and worst agreements and the respective P values were calculated. In the latter, we evaluated the paired agreements and disagreements, with linear differentiation between inequalities, that is, we considered that the difference between 1 and 2 was similar to the difference between 2 and 3. The value of K strength of agreement was as follows: <0.20 = poor, 0.21 to 0.40 = fair, 0.41 to 0.60 = moderate, 0.61 to 0.80 = good, and 0.81 to 1.00 = very good.[9] The statistical software used was R version 3.2.5 (R Core Team, Vienna, Austria).

It is emphasized that the general concordance analysis was performed according to the cutaneous phototype and the type of surgery to evaluate if such variables could be confounding factors in the irradiated site evaluations. The cutaneous phototype was graded according to the Fitzpatrick classification[10] and divided into 3 categories: type II or III (white or light brown), type IV (moderate brown), and type V or VI (dark brown or black). Regarding the type of surgery, breast conserving surgery and mastectomy were considered.

Back to Top | Article Outline

3 Results

The clinical and treatment characteristics are given in Table 1 .

Table 1

Table 1

Table 1

Table 1

The overall absolute agreement between the 4 observers was considered fair or moderate according to both scales with 10.2% (87 of 850) for the RTOG scale and the Krippendorff's.alpha (0.408, CI 0.370–0.431), and 29.2% (249 of 853) for the WHO scale and the Krippendorff's.alpha (0.559, CI 0.529–0.590) (Table 2).

Table 2

Table 2

When assessing the overall absolute agreement between the evaluators, considering phototype and surgery there was a fair agreement according to the RTOG scale, when evaluating patients with type V or VI and mastectomy with 3.7% (3 of 82), 8.8% (26 of 295), and the Krippendorff's.alpha (0.278, CI 0.187–0.376) and (0.340, CI 0.299–0.389), respectively. In WHO scale, the assessing of patients undergoing mastectomy was 29.9% (88 of 294) and Krippendorff's.alpha 0.409 (CI 0.356–0.472), that is, a fair overall absolute agreement (Table 3).

Table 3

Table 3

It is interesting to note that, in the WHO scale, although the overall absolute agreement is marginally higher for mastectomy compared to conservative surgery, the Krippendorff's alpha shows just a fair agreement, whereas the Krippendorff's alpha shows a substantial agreement for conservative surgery. This can be explained because for most evaluations, Cph slightly disagreed with the 3 other evaluators (1 against 0). Despite this disagreement, Krippendorff's test is able to find a substantial agreement when 3 out of 4 evaluators coincide in their evaluations, and there is only 1 evaluation in slight disagreement.

When assessing the agreement between the evaluators, there was a poor agreement between the Cph and Aph evaluators (κ = 0.162) (the RTOG scale) and Cph with Bph (κ = 0.197; κ = 0.183, the WHO and RTOG, respectively). However, a good agreement (κ = 0.761; κ = 0.740) between A and Aph was found when using the WHO and RTOG, respectively, and this was statistically significant (P < .001) (Table 4).

Table 4

Table 4

Back to Top | Article Outline

4 Discussion

The results of this prospective study that analyzed 100 BC women during RT support the hypothesis that there will be a very low overall agreement between evaluators. However, a good agreement was verified between A and Aph when using both scales. Our study stands out for being the first to consider the application of the scales to evaluate the irradiated skin weekly throughout RT and between 1 and 3 months after the end of treatment, totaling 855 evaluations.

A study suggested that the measurements and records of radiodermatitis should be conducted with equal precision and care given to tumor control, as in the internationally agreed systems for tumor staging and for the evaluation of tumor control impeding the tumor progression and providing an improvement in the quality of care and, therefore, in the patients’ quality of life.[11] Although there are tools to promote specific criteria for graduating skin toxicity and are widely used, reliability and validation of data are scarce,[2,5] mainly in studies on inter and intra-observer agreement.[11] Other studies generally evaluate objective methods for assessing the skin, comparing them with subjective measurement tools.[3,12]

The results of our study showed a low overall agreement between the evaluators when using both scales. One study found results different from ours showing a general agreement of 79% (WHO) and 68% (RTOG) among the independent evaluators, with K adjusted from 0.64 (0.43–0.84, 99% CI) and 0.53 (0.32–0.72, 99% CI), respectively.[13] Another study[14] identified a greater agreement among radiotherapy technicians when compared to radiotherapist oncologists (W [Kendall's coefficient] = 0.6866, time 1 and 0.6981 time 2, vs. 0.6517). However, the first study included only evaluations performed at 1 time and by 2 evaluators, and the second, although it included 29 evaluators, only 9 patients were evaluated at 3 different times.

Another aspect refers to the general agreement considering the cutaneous phototype and the type of surgery. Low overall agreement was observed in the evaluations of patients with cutaneous phototype VI (dark brown or black) when compared to the other phototypes and in the evaluations of patients who underwent mastectomy when compared to those who underwent conservative surgery, using the RTOG scale, with statistically significant difference. Also comparing the RTOG scale with WHO, overall agreement was greater using the WHO scale, suggesting that the cutaneous phototype and the type of surgery may make it difficult to evaluate the RTOG scale, and this scale is more sensitive to these variables compared to the WHO scale. Finally, it must be pointed out that no study analyzing the agreement between scales, considering cutaneous phototype and type of surgery, was found in the literature.

In our study, we also found poor agreement between Cph with Aph (using the RTOG scale) and with Bph (both with the RTOG scale and the WHO scale). The discrepancy in the assessments performed between Cph with Aph and Bph can be attributed to schooling and time of professional experience, as seen in the study[13] that identified a high inter-observer reliability of experienced radiotherapist technicians when compared to other professionals.

Another important result of the present study was the presence of a good agreement between the in vivo records (A) and the photographic records (Aph) for both scales, indicating the usefulness of the photographic records. The photographic record of acute radiodermatitis, whose reliability through test–retesting was identified,[15] is important regarding the likelihood of late lesion development and also because it provides better local control of this adverse event. In addition, records of the degree and time of onset radiodermatitis allow us to compare multicenter studies and/or different treatment protocols where there is a difficulty of screening observers for the type of care applied.[11,16]

Given the results of the present study, and in order to improve the agreement among the evaluators and trying to provide a greater precision in the use of the scale, our group constructed an illustrative RTOG scale (Fig. 3), considering Fitzpatrick classification.[10] It was based on Fitzpatrick classification, since they identified in practice, a difference in the presence of the degree of radiodermatitis according to the type of skin. Our perspective is to validate the illustrative scale constructed by our group aiming to use the scale in clinical practice. We chose the RTOG scale since it was the scale that had a greater proportion of severe radiodermatitis as well as having encompassed its different degrees. In addition, this scale correlated more strongly with changes in cutaneous blood flow when compared to WHO[3] and presented a lower overall agreement in the present study.

Figure 3

Figure 3

Among the strengths of our study are the number of photos evaluated (855 photos), and the fact that the patients were evaluated throughout RT, thus providing the records of all possible degrees of radiodermatitis. In addition, the application of 2 scales that are easy to be used in clinical practice stands out. One limitation of our study was the fact that the evaluations were carried out only by 3 professionals. Another point is the inhomogeneity of the sample, which has been mitigated with the investigation of the influence of skin phototype and type of surgery in the overall agreement analysis.

Back to Top | Article Outline

5 Conclusion

The RTOG and WHO scales should be used with caution in the clinical practice to identify the prevalence and severity of radiodermatitis, particularly when the evaluation is made by inexperienced professionals. Another point of caution is that darker skins and the occurrence of mastectomy could make the analysis more difficult. An illustrative scale was designed by our group aiming to improve accuracy and agreement between evaluators and it will be validated in subsequent clinical studies. Besides that, the scale proposed could help in the training of inexperienced personnel and could allow for later evaluations once experience has been acquired, since the scale includes the skin phototypes.

Back to Top | Article Outline

Author contributions

All authors made substantial contributions to all of the following: the conception and design of the study, or acquisition of data, or analysis and interpretation of data; drafting the article or revising it critically for important intellectual content; and final approval of the version to be submitted.

Conceptualization: Marceila de Andrade Fuzissaki, Paula Philbert Canto, Yara Cristina de Paiva Maia.

Data curation: Marceila de Andrade Fuzissaki.

Formal analysis: Marceila de Andrade Fuzissaki, Carlos Eduardo Paiva, Yara Cristina de Paiva Maia.

Investigation: Yara Cristina de Paiva Maia.

Methodology: Marceila de Andrade Fuzissaki, Carlos Eduardo Paiva, Marcelo de Almeida Maia, Yara Cristina de Paiva Maia.

Project administration: Paula Philbert Canto.

Software: Marcelo de Almeida Maia.

Supervision: Yara Cristina de Paiva Maia.

Validation: Marcelo de Almeida Maia, Yara Cristina de Paiva Maia.

Visualization: Thais de Oliveira Gozzo.

Writing – original draft: Marceila de Andrade Fuzissaki, Yara Cristina de Paiva Maia.

Writing – review & editing: Marceila de Andrade Fuzissaki, Carlos Eduardo Paiva, Thais de Oliveira Gozzo, Marcelo de Almeida Maia, Paula Philbert Canto, Yara Cristina de Paiva Maia.

Marceila de Andrade Fuzissaki orcid: 0000-0001-7091-0278.

Back to Top | Article Outline


[1]. Singh M, Alavi A, Wong R, Akita S. Radiodermatitis: a review of our current understanding. Am J Clin Dermatol 2016;17:277–92.
[2]. Schnur JB, Love B, Scheckner BL, et al. A systematic review of patient-rated measures of radiodermatitis in breast cancer radiotherapy. Am J Clin Oncol 2011;34:529–36.
[3]. Huang CJ, Hou MF, Luo KH, et al. RTOG, CTCAE and WHO criteria for acute radiation dermatitis correlate with cutaneous blood flow measurements. Breast 2015;24:230–6.
[4]. Bostock S, Bryan J. Radiotherapy-induced skin reactions: assessment and management. Br J Nurs 2016;25:S18–24.
[5]. Wong RK, Bensadoun RJ, Boers-Doets CB, et al. Clinical practice guidelines for the prevention and treatment of acute and late radiation reactions from the MASCC Skin Toxicity Study Group. Support Care Cancer 2013;21:2933–48.
[6]. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 2007;39:175–91.
[7]. Cox JD, Stetz J, Pajak TF. Toxicity criteria of the Radiation Therapy Oncology Group (RTOG) and the European Organization for Research and Treatment of Cancer (EORTC). Int J Radiat Oncol Biol Phys 1995;31:1341–6.
[8]. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
[9]. Altman DG. Practical Statistics for medical research. London, 1991
[10]. Fitzpatrick TB. The validity and practicality of sun-reactive skin types I through VI. Arch Dermatol 1988;124:869–71.
[11]. Lopez E, Nunez MI, Guerrero MR, et al. Breast cancer acute radiotherapy morbidity evaluated by different scoring systems. Breast Cancer Res Treat 2002;73:127–34.
[12]. Gonzalez Sanchis A, Brualla Gonzalez L, Sanchez Carazo JL, et al. Evaluation of acute skin toxicity in breast radiotherapy with a new quantitative approach. Radiother Oncol 2017;122:54–9.
[13]. Sharp L, Johansson H, Landin Y, et al. Frequency and severity of skin reactions in patients with breast cancer undergoing adjuvant radiotherapy, the usefulness of two assessment instruments: a pilot study. Eur J Cancer 2011;47:2665–72.
[14]. Acharya U, Cox J, Rinks M, et al. Ability of radiation therapists to assess radiation-induced skin toxicity. J Med Imaging Radiat Oncol 2013;57:373–7.
[15]. Wengstrom Y, Forsberg C, Naslund I, et al. Quantitative assessment of skin erythema due to radiotherapy: evaluation of different measurements. Radiother Oncol 2004;72:191–7.
[16]. Graham PH, Plant NA, Graham JL, et al. Digital photography as source documentation of skin toxicity: an analysis from the Trans Tasman Radiation Oncology Group (TROG) 04.01 post-mastectomy radiation skin care trial. J Med Imaging Radiat Oncol 2012;56:458–63.

breast cancer; radiodermatitis; radiotherapy; weights and measures

Copyright © 2019 The Authors. Published by Wolters Kluwer Health, Inc. All rights reserved.