ORIGINAL ARTICLESInternational Variation in Histologic Grading Is Large, and Persistent Feedback Does Not Improve ReproducibilityFurness, Peter N. B.M., B.Ch., Ph.D., F.R.C.Path.; Taub, Nicholas M.Sc.; Assmann, Karel J. M. M.D., Ph.D.; Banfi, Giovanni M.D.; Cosyns, Jean-Pierre M.D., Ph.D.; Dorman, Anthony M. M.B., B.Ch., F.R.C.Path.; Hill, Claire M. M.D., F.R.C.Path., F.R.C.P.I.; Kapper, Silke K. M.D.; Waldherr, Rudiger M.D.; Laurinavicius, Aryvdas M.D.; Marcussen, Niels M.D., D.M.Sc.; Martins, Anna Paula M.D.; Nogueira, Malfada M.D.; Regele, Heinz M.D.; Seron, Daniel M.D.; Carrera, Marta M.D.; Sund, Ståle M.D.; Taskinen, Eero I. M.D.; Paavonen, Timo M.D.; Tihomirova, Tatjana M.D.; Rosenthal, Rafail M.D., Ph.D.Author Information From Leicester (P.N.F., N.T.), U.K.; Nijmegen (K.J.M.A.), The Netherlands; Milan (G.B.), Italy; Brussels (J.-P.C.), Belgium; Dublin (A.M.D.), Ireland; Belfast (C.M.H.), Northern Ireland; Mannheim (S.K.K., R.W.), Germany; Vilnius (A.L.), Lithuania; Aarhus (N.M.), Denmark; Lisbon (A.P.M., M.N.), Portugal; Vienna (H.R.), Austria; Barcelona (D.S., M.C.), Spain; Oslo (S.S.), Norway; Helsinki (E.I.T., T.P.), Finland; and Riga (T.T., R.R.), Latvia. Supported by a grant from the European Union (Standards Measurement and Testing Programme, Contract no. SMT4-CT98–7514). Address correspondence and reprint requests to Professor Peter N. Furness, Clinical Sciences Laboratories, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, U.K.; e-mail: [email protected] The American Journal of Surgical Pathology: June 2003 - Volume 27 - Issue 6 - p 805-810 Buy Abstract Histologic grading systems are used to guide diagnosis, therapy, and audit on an international basis. The reproducibility of grading systems is usually tested within small groups of pathologists who have previously worked or trained together. This may underestimate the international variation of scoring systems. We therefore evaluated the reproducibility of an established system, the Banff classification of renal allograft pathology, throughout Europe. We also sought to improve reproducibility by providing individual feedback after each of 14 small groups of cases. Kappa values for all features studied were lower than any previously published, confirming that international variation is greater than interobserver variation as previously assessed. A prolonged attempt to improve reproducibility, using numeric or graphical feedback, failed to produce any detectable improvement. We then asked participants to grade selected photographs, to eliminate variation induced by pathologists viewing different areas of the slide. This produced improved kappa values only for some features. Improvement was influenced by the nature of the grade definitions. Definitions based on “area affected” by a process were not improved. The results indicate the danger of basing decisions on grading systems that may be applied very differently in different institutions. © 2003 Lippincott Williams & Wilkins, Inc.