As a result of the recent improvement in immunosuppressive therapy, which has drastically decreased the incidence of acute rejection, chronic allograft injury (CAI) now constitutes the primary cause of late renal allograft failure and is currently the greatest limiting factor in renal transplantation. CAI is clinically defined by a progressive and irreversible decrease of renal function, combined with proteinuria and progressing hypertension. CAI is the consequence of multifactorial processes involving both alloantigen-dependent and independent factors. The end result is renal change characterized by vascular intimal hyperplasia, interstitial fibrosis (IF), tubular atrophy (TA), and transplant glomerulopathy (1). CAI can be diagnosed early, at a preclinical stage, by histopathological changes, before the occurrence of clinical symptoms (1–4). Biopsies performed soon after transplantation have revealed that IF/TA is present early after transplantation, after an intense period of tubulointerstitial injury (5), with lesions present at 3 months (2, 6, 7). Studies using surveillance biopsies have consistently shown that IF correlates with renal graft survival and with long-term graft function (2, 4, 6–10). It thus suggests that an early histologic detection of IF might be a predictive index of subsequent deterioration of graft function and, potentially, be a surrogate marker to assess the efficacy of specific therapeutic interventions (11). The Banff classification was developed to standardize interpretation of renal allograft biopsies and to establish objective end points for clinical trials, and histopathological samples are usually graded using Banff 2007 classification (12). Although this scoring system can be used to predict long-term renal allograft outcome, there are several limitations to its use. First, the IF Banff grading is a coarse quantification, not sensitive enough to detect small changes in fibrosis. Indeed, because this grading is not continuous, biopsies may be classified in distinct categories, even though the actual difference is small. Furthermore, the use of a small number of grades to describe the severity of individual histological abnormalities may lack accuracy, as for example when assessing sequential biopsies or during the early stages of IF/TA when intervention is most likely to preserve graft function (13). Second, by being semiquantitative and depending to a certain extent on the skills of the observer, the reproducibility of the Banff classification is limited by interobserver variations that make across center comparisons inaccurate (11, 13–15). For these reasons, it was important to develop accurate and reproducible methods to quantify the histological abnormalities of CAI, particularly IF. Several computerized approaches have been used to quantify interstitial chronic renal damage (8, 9, 16, 17). The early work that consisted in manual point counting methods or semiautomatic computerized measurements (16, 18) is tedious and not applicable in routine practice. These studies also raise concerns regarding variability between different methods and a lack of correlation with conventional histologic analysis (18). We have therefore developed a new segmentation-based method of automatic color image analysis to quantify IF. This method follows the specifications of the Banff classification on IF and can be applied for quantification of the main component of CAI in routine use. The aims of the present study are to (1) detail the technical aspects of this new computerized method to quantify IF, (2) compare the results with those obtained by human pathological assessment, and (3) demonstrate its robustness and reproducibility.
Segmentation-Based Automatic Quantification of IF
We applied the automatic color image analysis on the reference set of Masson's trichrome (MT) biopsy sections, taken from normal kidneys (Fig. 1A) and grafted kidneys suffering from various grade/stages of IF and TA (IFTA, from ci1 to ci3 in the Banff 2007 classification), which is the histologic feature of CAI (Fig. 1B–D). It takes 5 min to create the image mosaic of each specimen (typical size is 680×3020 range) and less than 1 min to process the image. For each section, the original picture, the mask of green pixels and the mask of IF are shown. The IF score varies from 1.36%±0.87% in normal child kidney and 8.1%±7.0% in normal adult kidney to 20.35%±5.69% (ci1), 31.93%±3.26% (ci2), and 49.56± 5.38% (ci3). These results show that IF increases with age and with the evolution of CAI.
Validation and Reliability Tests
Color and Geometric Variations
To validate the method, we tested the IF quantification against the main sources of variation: staining variation, color variation, acquisition parameters, and operator subjectivity. Color may vary across biopsy images due to histologic section, quality of staining, and image acquisition parameters. Two kinds of color variation were tested: staining and illumination. Figure 2(A) shows the images of four serial biopsy sections stained on three different days and Table 1S (http://links.lww.com/TP/A493) gives the computerized quantification of IF. These results show that the result of quantification is independent of staining variations. Figure 2(B) shows three biopsies with various colors and the result given by the system, which adapts the color segmentation to each image. The other varying parameters such as rotation (Fig. 2C), cropping (Fig. 2D), magnification (Fig. 2E), and acquisition systems (Fig. 2F) did not impact IF quantification.
Inter- and Intraobserver Variations
The reliability of automatic image analysis to quantify IF was evaluated by estimating intraobserver and interobserver variations. The same MT-stained sections, randomly selected from a panel of biopsies representative from pathology reports of CAI at different stages, were imaged 10 times by the same operator, each on a different day. In addition, 10 operators imaged these sections. The percentage of IF surface was assessed by the proposed method and a reliability analysis was performed by calculation of inter- and intraclass correlation using analysis of variance (ANOVA) test (P<0.003) and the intraclass coefficient correlation (ICC). The value of the intraoperator ICC is 0.90, with its 95% confidence interval (CI) equal to [0.68–0.99] and the ICC interoperators is 0.88 [0.631–0.99]. From these analyses, we conclude that no significant differences between the mean scores of IF are found when measured with the program 10 times by the same observer or 10 times by 10 different observers, showing that the method is robust against operator variations. Inter- and intraoperator standard deviations were 4.07% and 3.89%, respectively (see Figure 1S, SDC, http://links.lww.com/TP/A493).
Comparison of Automatic Analysis and Semiquantitative Analysis by an Expert
We performed a side-by-side comparison of our method against a semiquantitative method on a different set of 90 biopsies to test the concordance with scoring by an expert. A kappa test was applied to quantify and evaluate the results (Fig. 3). The IF score increased between the three groups defined by the Banff classification: the mean and median score were 0.21, 0.35, 0.48, and 0.2, 0.34, 0.47 in the ci1, ci2, and ci3 groups, respectively. The kappa value was 0.68 (P<0.001) with 95% CI [0.55–0.79] and the weighted kappa was equal to 0.75. Moreover, by changing the threshold values of the semiquantitative grades (27%, 42%), the kappa value increased to 0.81 (P<0.001) with 95% CI [0.71–0.91]. Altogether these results demonstrate a good agreement between the two methods.
Correlation With Renal Function
Finally, IF quantification was correlated with renal function assessed by the estimated glomerular filtration rate using the Modification of Diet in Renal Disease formula in another set of 90 biopsies (Fig. 4). We observed a good correlation between IF quantification and renal function (P=0.0003, R=−0.38).
IF is a morphological hallmark of chronic renal disease and one of the main histologic components of CAI. It is also the least sensitive feature in biopsy sampling (11). For these reasons, several morphometric techniques to quantify IF have been developed for approximately 20 years and their characteristics are summarized in Table 1.
Following Banff specifications, we have designed a new algorithm to analyze MT-stained biopsies to quantify IF with specific constraints: applicable in routine practice, accurate, and reproducible. We have chosen MT instead of Sirius Red (SR) staining for several reasons: (1) the Banff classification uses MT staining for IF grading, (2) even if some methods use SR-stained tissue under polarized (8, 9, 17–19), nonpolarized light (11, 18, 19), or collagen III (19, 20) to quantify IF, most renal pathologists use trichrome staining, which is more readily available than SR staining, (3) as discussed in (8, 17), inflammation may cause under staining of collagen with SR. In a recent publication (19), the authors have compared several biopsy stainings: collagen III, trichrome-periodic acid-Schiff, SR unpolarized light, and SR polarized and concluded that the best correlation with renal function was with collagen III and unpolarized SR. However, neither techniques are available in routine practice.
The main challenge of computerized image analysis is the image variability which comes from the sampling, the quality of staining and the acquisition parameters that make methods based on gray level thresholding (8, 21) or texture analysis (22, 23) less efficient and time consuming (19). Hence, we have chosen clustering techniques and particularly color image quantization, followed by combining color, spatial, and shape feature segmentation because they adapt the analysis to each image variation. Variation of trichrome hue is due to staining and may vary from green to blue. In our center, the variation in MT staining is drastically reduced thanks to the use of a staining machine; however, our color image analysis allows for automatic adaptation to the hue variation enabling multicenter studies. The experimental results in §2.2.1 Color and geometric variations have shown that our method is equally robust against magnification, cropping, rotation, and setup variations. Moreover, the ANOVA test in intra- and interobservers shows the good reproducibility of the proposed method. In addition, compared with others (8, 18, 20, 22 and Table 1) the presented method can be rapidly assessed, thus allowing for routine work. Indeed, it allows for automatic elimination of the capsule, normal basement membrane, vessels, and normal and sclerotic glomeruli and only requires limited operator interventions for exclusion of the medullar part, thus reducing the output variations of the system and interoperator variations. The standard intraobserver deviation for the same image is reduced from 3.89 to 1.14.
We have also tested our method against an expert Banff scoring and the kappa value was 0.68, which indicates a substantial agreement and is better than the value of 0.39, reported by Furness and Taub (11). In the recent comparison study (19) between the various methods of fibrosis analysis, reproducibility between pathologists was good. However, the assessment had been performed on only 15 samples by three pathologists who work together. We have found that experts tend to overestimate grade 3 and underestimate grade 1 (Fig. 3). This may be explained by the fact that the pathologist defines the extent of the fibrotic interstitial tissue whereas the system counts the green pixels only. Furthermore, when there are many sclerotic glomeruli, the expert tends to classify a biopsy in grade 3, while the system excludes glomeruli. For ci1, the pathologist removes all thin structures but this thickness depends on the biopsy, whereas we have defined a fixed thickness value for normal basement membrane. In addition, we have noticed that (1) the weighted kappa is higher than the kappa and (2) a modification of the threshold values of the IF Banff grades results in increased kappa value. This means first, that some disagreements are due to values near the grade limits and second, that the threshold values arbitrarily defined by the Banff classification might not be adapted to image analysis quantification.
Importantly, computerized IF was also correlated with renal function, as previously reported in a clinical trial that used our quantification method (24). As our technique is reproducible, simple to use and adapted to various color and image changes, it has been used in centralized analysis of samples from two clinical trial centers (24, 25). These trials (24, 25) confirmed that early precise quantification of IF by computerized image analysis provides a reliable surrogate marker for the evolution of graft function and that our method is appropriate for its use in multicenter trials.
A crucial question is whether IF lesions assessed by automated method on renal biopsy are predictive of subsequent graft function. This was analyzed in a longitudinal study of renal biopsies in 140 renal transplant recipients (26).
IF at M3 was correlated with graft function at M3, M12, and M24 and IF at M12 with estimated glomerular filtration rate at M12 and M48. We have also shown that having more than 30.4% of IF at M12 was the best predictor of glomerular filtration rate worsening by more than 15% between M12 and M36. Furthermore, IF evolution between D0 and M3 was correlated with renal function at M24, M36, and M48. It thus seems that IF, quantified by our method, is able to predict future allograft function to some extend.
Several limitations for the use of IF score in transplantation have to be acknowledged: (1) IF is a nonspecific parameter with numerous etiologies. Therefore, physicians have to take into account many other parameters for clinical decision such as trough levels of calcineurin inhibitors, donor criteria, interstitial inflammation. However, we believe that with the development of nonnephrotoxic immunosuppressive drugs and with the widespread use of surveillance biopsies in the monitoring of renal transplantation, computerized IF could become a key component to monitor and personalize immunosuppression in the future. (2) Another issue that might hinder the correct quantification of IF is its potentially focal nature and thus the sampling error that might be introduced when using core biopsies. The new technique cannot compensate for this source of error but it might help to estimate its magnitude by adequately evaluating the spatial distribution of IF in the kidney.
In summary, our method is the first validated method of color quantification, correlated to the Banff classification and to renal function that can adapt to color variation in a routine practice. Moreover, IF quantified by this method could be able to predict future allograft function. For these reasons, it can be used at the patient level as a new tool for posttransplant monitoring and at the population level in clinical trials, as already demonstrated (24, 25).
Similar studies on nontransplanted kidneys (27) suggest that the proposed technique could be easily adapted to different types of fibrosis studies in other organs such as liver and lung. Furthermore, as recent reports have revealed clinical significance of inflammatory fibrosis (28), our software could also be adapted to automatically differentiate inflammatory versus noninflammatory fibrosis.
MATERIALS AND METHODS
Protocol graft biopsies, obtained after informed consent, have been performed at Necker Hospital for more than 30 years. We selected a sample of these biopsies for the present study. Tissue was fixed in alcohol formalin acetic solution, embedded in paraffin, cut in 2-μm thick sections and stained with MT using the Leica Autostainer XL (Leica, Germany). Allograft biopsies were scored according to Banff criteria (12). All specimens revealed sufficient cortical area for the evaluation of IF and TA, with seven or more glomeruli. To design and test the color image analysis method, representative samples of MT-stained renal biopsies were chosen on the ground of pathology reports as follows: normal renal biopsies of children (n=3) and adults (n=4), 60 renal transplant biopsies selected according to the percentage of IF defined by an expert pathologist: from 6% to 25% (n=20), grade 1 (ci1); from 26% to 50% (n=20), grade 2 (ci2); and more than 50% (n=20), grade 3 (ci3).
To compare the fully quantitative automatic method to standard semiquantitative methods of analysis, 90 other MT-stained transplant biopsies at various IF stages were randomly selected: 30 ci1 biopsies, 30 ci2 biopsies, and 30 ci3 biopsies. For each biopsy, one section was randomly selected and imaged. IF score was assessed by color image analysis and compared with that obtained blindly by a renal pathology expert, who reviewed glass slides for the determination of IF by semiquantitative assessment, according to the IF criteria defined by the Banff 2007 classification. Finally, another set of 90 biopsies was used to analyze correlation between IF score and renal function.
Image Acquisition Protocol
Images of the entire biopsy cortex were captured by either of two possible acquisition setups: (1) a Nikon DXM1200 color camera mounted on a Nikon Eclipse E1000 M microscope (Nikon, Japan). The images are captured using 4× objective, NA=0.2, for a total magnification of 40×. The microscope stage is remotely controlled by the Lucia software (Nikon, Japan) that allows capturing the entire biopsy by the concatenation of every field, thus avoiding field overlaps. (2) Zeiss Mirax Scan slide scanner (Zeiss, Germany), with an objective 20×, NA=0.8, and a CCD Marlin camera. This setup is a virtual microscope and images are exported to tiff format at a scale of 1:8.
For each biopsy, a cortical section from one MT slide was captured. The cortex was defined as the part inside the renal capsule and outside the medulla. The operator eliminated the medulla at the acquisition phase or during the analysis step.
Color Image Analysis
The protocol for quantification consists of two steps: (1) segmentation to extract green areas and the biopsy area; (2) postprocessing to compute the IF index, which is detailed in the SDC file (http://links.lww.com/TP/A493).
At the end of the process, the IF surface area is defined as the surface of green pixel minus basement membranes, capsule, glomeruli, and vessels. The tissular surface is defined as the number of pixels in the original biopsy minus the capsule. The index of the IF surface is defined as the ratio between the IF surface area and the total surface of the cortex area in the biopsy.
Currently, all quantifications are performed using proprietary software developed at the Institut Pasteur [Patent EU number 2004292513-1]. The algorithm and software will be made available to educational and research institutes on request. Until then, it is possible for collaborators to send us the samples for quantification as already performed in previously reported multicenter trials (24, 25).
Results are expressed as mean±SD. A correlation test is applied for validation against variations of acquisition parameters: the Pearson's correlation coefficient and its P value are given. The reproducibility of intra- and interobservers is shown by the ICC value with its 95% CI and by a one-way ANOVA test given by P value and intra- and interoperator standard deviations. The comparison between our method and the semiquantitative Banff scoring is given by the kappa test with its P value and 95% CI. The continuous IF data are converted into categorical data using the threshold values (25%, 50%) of the Banff classification. The interrater agreement is expressed by the Cohen kappa value and the weighted kappa value. For each IF group, the mean and median are given. We use the Pearson correlation test to correlate with the renal function. All statistical analyses are performed by using R, version 2.8.1 (http://cran.r-project.org).
The authors thank Alexandre Dufour for valuable discussions and Martha Melter for revising the English style of the manuscript.
1.Nankivell BJ, Borrows RJ, Fung CL, et al. The natural history of chronic allograft nephropathy
. N Engl J Med
2003; 349: 2326.
2.Legendre C, Thervet E, Skhiri H, et al. Histologic features of chronic allograft nephropathy
revealed by protocol biopsies in kidney transplant recipients. Transplantation
1998; 65: 1506.
3.Seron D, Moreso F, Ramon JM, et al. Protocol renal allograft biopsies and the design of clinical trials aimed to prevent or treat chronic allograft nephropathy
2000; 69: 1849.
4.Yilmaz S, Tomlanovich S, Mathew T, et al. Protocol core needle biopsy and histologic Chronic Allograft Damage Index (CADI) as surrogate end point for long-term graft survival in multicenter studies. J Am Soc Nephrol
2003; 14: 773.
5.Nankivell BJ, Borrows RJ, Fung CL, et al. Delta analysis of posttransplantation tubulointerstitial damage. Transplantation
2004; 78: 434.
6.Chapman JR. Longitudinal analysis of chronic allograft nephropathy
: Clinicopathologic correlations. Kidney Int
2005; 68: 108.
7.Nankivell BJ, Fenton-Lee CA, Kuypers DRJ, et al. Effect of histological damage on long-term kidney transplant outcome. Transplantation
2001; 71: 515.
8.Grimm PC, Nickerson P, Gough J, et al. Computerized image analysis of Sirius red-stained renal allograft biopsies as a surrogate marker to predict long-term allograft function. J Am Soc Nephrol
2003; 14: 1662.
9.Pape L, Henne T, Offner G, et al. Computer-assisted quantification of fibrosis in chronic allograft nephropathy
by picosirius red-staining: A new tool for predicting long-term graft function. Transplantation
2003; 76: 955.
10.Seron D, Moreso F, Bover J, et al. Early protocol renal allograft biopsies and graft outcome. Kidney Int
1997; 51: 310.
11.Furness PN, Taub N. International variation in the interpretation of renal transplant biopsies: Report of the CERTPAP Project. Kidney Int
2001; 60: 1998.
12.Solez K, Colvin RB, Racusen LC, et al. Banff 07 classification of renal allograft pathology: Updates and future directions. Am J Transplant
2008; 8: 753.
13.Seron D, Moreso F, Fulladosa X, et al. Reliability of chronic allograft nephropathy
diagnosis in sequential protocol biopsies. Kidney Int
2002; 61: 727.
14.Gough J, Rush D, Jeffery J, et al. Reproducibility of the Banff schema in reporting protocol biopsies of stable renal allografts. Nephrol Dial Transplant
2002; 17: 1081.
15.Marcussen N, Olsen TS, Benediktsson H, et al. Reproducibility of the Banff classification of renal allograft pathology. Inter- and intraobserver variation. Transplantation
1995; 60: 1083.
16.Nicholson ML, McCulloch TA, Harper SJ, et al. Early measurement of interstitial fibrosis predicts long-term renal function and graft survival in renal transplantation
. Br J Surg
1996; 83: 1082.
17.Sund S, Grimm P, Reisaeter AV, et al. Computerized image analysis vs. semiquantitative scoring in evaluation of kidney allograft fibrosis and prognosis. Nephrol Dial Transplant
2004; 19: 2838.
18.Diaz Encarnacion MM, Griffin MD, Slezak JM, et al. Correlation of quantitative digital image analysis with the glomerular filtration rate in chronic allograft nephropathy
. Am J Transplant
2003; 4: 248.
19.Farris AB, Adams CD, Brousaides N, et al. Morphometric and visual evaluation of fibrosis in renal biopsies. J Am Soc Nephrol
2011; 22: 176.
20.Nicholson ML, Bailey E, Williams S, et al. Computerized histomorphometric assessment of protocol renal transplant biopsy specimens for surrogate markers of chronic rejection. Transplantation
1999; 68: 236.
21.Seron D, Carrera M, Grino JM, et al. Relationship between donor renal interstitial surface and post-transplant function. Nephrol Dial Transplant
1993; 8: 539.
22.Moreso F, Lopez M, Vallejos A, et al. Serial protocol biopsies to quantify the progression of chronic transplant nephropathy in stable renal allografts. Am J Transplant
2001; 1: 82.
23.Moreso F, Seron D, Vitria J, et al. Quantification of interstitial chronic renal damage by means of texture analysis. Kidney Int
1994; 46: 1721.
24.Servais A, Meas-Yedid V, Toupance O, et al. Interstitial fibrosis quantification
in renal transplant recipients randomized to continue cyclosporine or convert to sirolimus. Am J Transplant
2009; 9: 2552.
25.Servais A, Meas-Yedid V, Buchler M, et al. Quantification of interstitial fibrosis by image analysis on routine renal biopsy in patients receiving cyclosporine. Transplantation
2007; 84: 1595.
26.Servais A, Meas-Yedid V, Noel LH, et al. Interstitial fibrosis evolution on early sequential screening renal allograft biopsies using quantitative image analysis. Am J Transplant
2011; 11: 1456.
27.Ghoul BE, Squalli T, Servais A, et al. Urinary procollagen III aminoterminal propeptide (PIIINP): A fibrotest for the nephrologist. Clin J Am Soc Nephrol
2010; 5: 205.
28.Mengel M, Reeve J, Bunnag S, et al. Scoring total inflammation is superior to the current Banff inflammation score in predicting outcome and the degree of molecular disturbance in renal allografts. Am J T ransplant
2009; 9: 1859.