The updated ESPGHAN guidance (1) on coeliac disease has been issued and is lauded for its efforts to find a strategy to avoid the need for diagnostic biopsy in children. The main body of the text is admirable in its clarity and statements on the relative efficacy of diagnostic strategies. We totally endorse the proposal that higher titres of antibody in most assays are likely to have higher positive predictive values (PPVs) and combination testing with a more specific test is likely to enable avoidance of a requirement to biopsy. ESPGHAN acknowledge the interlaboratory variability between test performances and the potential for considerable batch-to-batch variability within commercial IgA anti-tissue transglutaminase antibodies (TG2) assays, which needs to be monitored by the use of independent quality control material and external quality assessment.
However, the use of common upper limit of normal (ULN) thresholds is an inappropriate generalisation of the evidence on the use of titres in its diagnostic algorithms and conflicts with the external quality assessment (EQA) data used. Lack of standardisation (full metrological traceability) of assays has not been ignored, but unfortunately the wrong approach has been used to try to compensate by harmonisation (by using the same units even where there is little metrological traceability).
Use of common ULN risks different centres getting very different screening results when using different assays, and therefore following a different pathway through the algorithm for the same patient. This has the potential to lead to some centres making different biopsy requests depending on the assay used, despite the guidelines intention of avoiding biopsy in those who screen strongly positive for TG2. Furthermore, it is not yet clear that the PPV of high titres is the same for all assays even where they produce similar apparent results for the mean or median ULN. There is variability in the performance characteristics of different commercially available TG2 assays. The NICE (2) coeliac working group reviewed the available evidence and found sensitivity ranging from 89% to 100% and specificity ranging from 25% to 100%. PPV of “high titres” >10xULN in our screened UK cohort is good, but false positives still occur. Forty percent of the coeliac disease patients in our screened population had low or medium TG2 titres (unpublished audit data, submitted) and 60% had high titres. But around 40% of the high titres were false positive. Most coeliac cases in our cohort are IgA endomysial antibodies (EMAs) positive at all levels of TG2. Therefore, the use of high titres alone to identify cases and avoid biopsy is potentially suboptimal with some test combinations and could be improved by the use of 2 tests. A high TG2 titre has been shown to occur in patients with liver disease, diabetes, and in up to 40% of patients with end-stage heart failure (3).
In screening symptomatic cases, the division of high levels of antibody from other levels is obviously potentially useful (4,5), but requires local thresholds to be established by audit for each combination of assay and test population. It may be possible to validate a single threshold with high predictive value for a single assay even across multiple centres, but it will not be possible to use a single threshold for all assays. It is an inevitable characteristic of most assays that higher titres have better predictive values and the results of limited comparator studies are not surprising (4,5) (especially when followed by EMA testing). The really important issue is how to deal with the moderate and low titres, what percentage of your affected population lies in that range, and that usually requires use of a more specific test (duodenal biopsy if following ESPGHAN algorithm, or EMA). A surrogate ULN cannot be imposed using the same multiples of the ULN for different tests in the absence of standardisation. It does not “harmonise” performance. It will not make all of the tests perform in a similar manner. Each centre should have an understanding of their potential false-positive rate at different levels of positivity (even at very high titres) to avoid inappropriate management.
The guidance has attempted to compensate for the lack of standardisation between TG2 assays consequent on the lack of an international reference preparation (there are currently more than 20 manufacturers of TG2 assays using various reference materials), by using a common decision threshold based on the upper limit of “normal” quoted by the manufacturers of each kit (xULN). Such an arbitrary cutoff cannot be generalised as a statement which is valid for all kits on the market in all screened cohorts, even if published studies are internally valid. Pretest probability affects screening outcomes markedly. ULN thresholds should not be adopted by any clinician on the basis of current evidence, unless local audit has demonstrated and evaluated a PPV and NPV for a defined diagnostic high-level cutoff in their assay, in their cohort (this is acknowledged best laboratory practice, is the main function of audit of assay performance, and is often needed for many assays where lack of standardisation prevails to establish and validate local interpretive thresholds, such as autoantibody assays and anticardiolipin antibodies). Where this ULN threshold will lie to produce a very high PPV may vary widely for different assays, for different centres or even for the same assay used in different centres with different referral or testing criteria. You cannot generalise this threshold from the existing published data and definitely not from EQA results. Generalised recommendations about fixed threshold decision points (even those based on 3x and 10xULN) only make sense if there is good comparability of dose responses between assays, and ideally with traceable to a reference preparation. None of these currently exist.
Another major barrier is the spread of results seen in different users of the same assay. The suggestion that coeliac disease cases will be concentrated in the higher titre cohort is undoubtedly true, but it does not follow that the percentage of cases present in the higher titre cohort will be the same even when the same assay is used to measure the same samples. Quoting a common putative xULN does not fix this problem and ignores the fact that many of the assays have high variability such that they cannot reliably distinguish 6xULN from 14xULN with a high degree of certainty in multiple laboratories using the same assay kit and sample. We therefore re-present some of the EQA data used in the guidelines to illustrate this. Figure 1 shows the ULN data for the EQA distributions used for the 6 methods with the highest user numbers, including the effects of assay variability on the ULN as 1 SD error bars, on the same dataset as analysed in Appendix II of the guideline (1). The lack of consideration of assay variability in real-life performance is thus a major omission in drawing the conclusions and recommendations in the guideline. It is obvious that the tests are not currently able to deliver what is being asked of them in the algorithms in the guideline.
In all immunoassays, biological variability is a potential confounding factor because you are measuring a complex mix of antibodies with different properties, not a monomorphic protein. Thus, one cannot assume that the dose-response relation on the mean values seen here represents the behaviour of all samples from different individuals in all assays. UK NEQAS data over time clearly appear to show that different samples with similar mean titres can give different relative signals across the assay groups, as is true for many immunoassays. It is not a simple matter of measuring the amount of a single antibody molecule. Mean or median data (such as shown in Appendix II of the guideline (1)) can be equally misleading.
The guidelines are really trying to define a “high” level of antibody to focus on the samples with a higher predictive value for coeliac disease. Clinicians and laboratories should beware of marketing claims of superior diagnostic performance solely on the basis of apparent signal intensity. Screening performance and dose-response relations are not necessarily linked. Possibly some will assume that because their assays produce high signal ratios in positive samples, they will perform better in meeting the recommendations of the guideline, but that does not necessarily follow. Only local audit data or large multicentre studies will determine performance characteristics and optimal thresholds which take account of assay variability. Clearly, users of the assay that provided a mean method ULN of 2.4xULN will not have the same diagnostic performance of those with a mean method ULN of 13.6xULN if a common 10x threshold is used. It does not necessarily mean that the assay is inferior in diagnostic performance, just that the assay ULN threshold must be different.
All of these assays have been validated on positive/negative criteria on 2×2 tables on an ULN cutoff, not using multiple higher titre thresholds. The former approach minimises the effect of assay variability in calibration and dose-response relations at other levels of signal. The distinction between samples with 1–3xULN and 3–10xULN in the algorithm for asymptomatic children is less achievable than distinguishing “high” titres from the others and the data presented would suggest that it is not appropriate to compare the signal intensity for many assays.
Guidelines can only be universally applied if they are truly generalisable and this depends on true comparability between the diagnostic tests used, consequent on standardisation. There is no justification for attempting to do anything more than recognise locally “high” levels interpreted against an audit-derived PPV/NPV. Concordance between TG2 assays is not sufficient to make general statements about a common threshold. The guidance should be amended to reflect this.
Finally, in the ESPGHAN recommendations for screening asymptomatic individuals, the least specific and most expensive test (HLA DQ) has been placed at the front of the algorithm. This is not usual practice in such screening strategies and could be criticised. Clinicians would be well advised to consider whether using a cheaper, equally sensitive, and more specific test first (TG2 or EMA), then utilising HLA typing where uncertainty remains, would make more cost-effective sense because published performance characteristics would potentially suggest a similar diagnostic outcome. It is assumed that this approach may be less expensive than what happens currently in children, where the cost and risk of anaesthesia for endoscopic biopsy are incurred, but ignores the fact that the same outcome could probably be achieved for an even greater saving by utilising a different configuration of the existing serological tests (4,5). Currently, HLA typing in the UK is much more expensive than serology, but that may change in time. The quality assurance issues for HLA testing are different from that of the serological assays and have not been addressed here.
Guideline groups really need to make better utilisation of quality assurance programme data before producing guidelines on the use of tests, to ensure that the assays are capable of delivering the required performance characteristics. These are data on the real performance of assays in the clinical community, and can ensure that their recommendations are achievable and workable across large networks or populations. EQA providers would be only too pleased to formally participate in guideline development if requested, by providing evidence and expertise to test the assumptions about performance and the effects of recommendation in the light of an overarching perspective on assay performance.
- Performance characteristics of different TG2 assays are different.
- Common multiples of ULN should not be used.
- Harmonised multiple and stratified thresholds cannot be used unless true comparability between the tests exists.
- An international standard for TG2/EMA is needed.
- Guidelines should test their assumptions rigorously against real-life performance in EQA and utilise the expertise of EQA providers.
- Recommendations should be generalisable and translatable for use in all centres.
1. Husby S, Koletzko S, Korponay-Szabo IR, et al. European Society for Pediatric Gastroenterology, Hepatology, and Nutrition guidelines for the diagnosis of coeliac disease. J Pediatr Gastroenterol Nutr
2. NICE clinical guideline 86: coeliac disease: recognition and assessment of coeliac disease 2009. http://guidance.nice.org.uk/CG86/Guidance
. Accessed October 26, 2012.
3. Sanders DS, Hopper AD, Azmy IA, et al. Association of adult celiac disease with surgical abdominal pain: a case–control study in patients referred to secondary care. Ann Surg
4. Kurppa K, Salminiemi J, Ukkola A, et al. Utility of the New ESPGHAN criteria for the diagnosis of celiac disease in at-risk groups. J Pediatr Gastroenterol Nutr
5. Alessio M, Tonutti E, Brusca I, et al. Correlation between IgA tissue transglutaminase antibody ratio and histological finding in celiac disease: a multicentre study. J Pediatr Gastroenterol Nutr