Reflux diagnostic with MII is more difficult than pH monitoring; an investigator experienced in this special pattern recognition is mandatory. Depending on signal quality, this sometimes can be a challenging task. To use this method in clinical practice and also to obtain reproducible study results, it is important to be aware of the agreement between different independent investigators.
To our knowledge, this is the first study on inter- and intraobserver agreement for MII performed between different centers. The investigators from the different centers acquired analysis skills independent from each other.
Two other published studies, 1 by Peter et al (3) and 1 by Dalby et al (4), addressing a similar issue worked with observers from the same institution, presumably being trained by the same person, which may lead to the same analysis “style” and therefore better agreement results.
Looking at the Cohen kappa values we found substantial (9 of 24) to perfect (13 of 24) agreement in most of the measurements of our study. Only 1 measurement showed a moderate and only 1 a fair agreement, reflected by a median kappa value of 0.83.
The time frame of 2 minutes for the “negative event” box in the Cohen kappa calculation was chosen randomly, which may influence the result of the kappa value. Because the kappa values did not change when a different time window, for example, 1 or 4 minutes for “number of episodes judged negative by both observers,” was chosen as long as the number exceeded 100 episodes, we chose the “2-minute time window,” which was used in analysis.
Peter et al (3) compared the interobserver agreement in twenty 3- to 6-hour recordings between 3 investigators from the same institution. They found median kappa values of 0.79, 0.83, and 0.83 for the 3 pairs of investigators. In contrast to our study, they did not have outliers with only fair to moderate agreement. Because we compared the results of investigators from different centers and longer recording times, our study gives a more valid picture for interobserver agreement.
Dalby et al (4) compared the analysis results of 2 investigators in thirty 24-hour-MII/pH measurements using the Bland-Altman plot. They found a low variability between investigators. The Bland-Altman plot takes the total number of reflux episodes into account, but not if the events seen by both investigators are congruent. For example, our patient A1 (Table 1) had 37 reflux episodes judged by observer 1 and 39 by observer 2. Using the Bland-Altman plot would have indicated a low variability between the investigators. As shown in Table 1, column 4 only 32 episodes were congruent (73% of all marked events). Therefore, using the Cohen kappa coefficient gives a more precise picture of interobserver agreement.
Using the Cohen kappa, the number of no-reflux events also comes into account. Because these episodes outnumber the reflux events by 6- to 12-fold, this may lead to better kappa values and hence to a supposedly better agreement between investigators. This is the reason why results are also presented as percentage of agreed events between the 2 analyses from the sum of all marked events (Fig. 1). Here, we found a more heterogeneous picture. Most measurements showed good to perfect agreement (7 measurements 70%–79% agreement, 7 measurements 80%–99% agreement), but there also were some measurements with only fair to moderate agreement (4 measurements with 50%–60% agreement, 3 with <40% agreement).
The problem with this approach may be that it does not take the total number of events into account. For example, in measurement B2 there were only 4 reflux events detected by 1 investigator. The other investigator did not find any event. Although the difference between the 2 investigators represents only 4 episodes, the percentage of agreement is 0%. In this case, the kappa value gives a more valid picture of the true agreement, the reason why both approaches were chosen and presented here.
The kappa coefficient of the intraobserver agreement (0.88) was, as one may have expected, slightly higher than that of the interobserver agreement (kappa coefficient 0.83). All except 1 measurement showed perfect results. Looking at the percentage (Fig. 2) we again found a more heterogeneous picture (agreement between 62% and 100%).
Both statistical approaches only judge interrater agreement. The problem with 24-hour pH impedance monitoring is that there is no way to calculate the true number of retrograde bolus movements in the measurement. This is the reason why we cannot comment on the quality of the observer's measurement interpretation.
One question arising is why there are analyses with only fair agreement, whereas other measurements show good to perfect agreement. This does not seem to depend on the investigator. For example, measurements A5, A6, B1, and B2 were analyzed by the same observer pair, but led to totally different results. Also, the heterogeneous intraobserver results led to the presumption that the difference is the result of variable signal quality.
Peter et al (3) also found consistently low agreement in some measurements and consistently high agreement in others. They also had presumed that this may have been the result of signal quality. As with most physiological signals, it is imaginable that this may be a particular problem in children, especially in infants. Additionally, patients with GERD in the pediatric population are often patients with other underlying conditions, for example, neurologically impaired children with impaired esophageal motility, which may have led to a different signal quality with more oscillations in the impedance channels.
Fröhlich et al (5) performed a swallowing test with liquid or viscous fluid under MII recording in 5 patients with surgically corrected esophageal atresia and in a cohort of 6 patients with GER symptoms but without any previous surgery. They found that patients with esophageal atresia showed mostly uncoordinated and often hardly recognizable patterns of bolus entry and exit at the different impedance channels during the swallows, whereas almost all of the individuals from the nonoperated reference group showed a normal complete bolus transit. These data support the presumption that in some patient groups a poorer tracing quality can be expected.
In spite of achieving good to excellent variability results in most measurements, an excellent inter- and intraobserver agreement in all of the measurements should be the goal, especially in tracings with lower signal quality. Therefore, analysis standards need to be developed. A continuous exchange and consensus finding between experienced investigators from different institutions is needed, which is 1 major aim of the German Pediatric Impedance Group. Less-experienced and new users of the method should be trained by experienced investigators. On the contrary, sufficient analysis quality of a new investigator should be confirmed on a regular basis.
In a study by van Wijk et al (6), an esophageal MII-manometry catheter was combined with videofluoroscopic images and used for characterization of intraluminal impedance patterns associated with gas reflux. This method could also been used to learn more about bolus movement patterns in low-quality tracings.
The identification of patients with low-quality tracings before beginning the analysis would also help the investigator in the interpretation of measurement results. An approach could be performing a standardized swallow test as used by Fröhlich et al (5). This may help identify patient groups in which a low signal quality must be expected beforehand, for example, patients with esophageal atresia.
A fairly new software tool displaying bolus movement in spatiotemporal color plots (ContourVIEW, Sandhill Scientific) simplifies the recognition of reflux patterns in MII and could also be helpful (7). It was not used in this study and may enhance inter- and intraobserver agreement.
In this study we found good to perfect intra- and interobserver agreement in most measurements; however, in a few tracings there was only a fair to moderate agreement. We assume that these heterogeneous results are the result of variable tracing quality. An improvement of analysis results may be achieved by developing a standard analysis protocol and a standardized method for judging tracing quality. More interchange between experienced analysts and better training options for new method users with validation of analysis quality should be enforced, which is what the German Pediatric Impedance Group stands for (8).
The authors thank Dipl-Math Thorsten Reineke for statistical advice.
1. Silny J. Intraluminal multiple electric impedance procedure for measurement of gastrointestinal motility. J Gastrointest Motil
2. Vandenplas Y, Rudolph CD, Di Lorenzo C, et al. Pediatric gastroesophageal reflux clinical practice guidelines: joint recommendations of the North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition (NASPGHAN) and the European Society for Pediatric Gastroenterology, Hepatology, and Nutrition (ESPGHAN). J Pediatr Gastroenterol Nutr
3. Peter CS, Sprodowski N, Ahlborn V, et al. Inter- and intraobserver agreement for gastroesophageal reflux detection in infants using multiple intraluminal impedance. Biol Neonate
4. Dalby K, Nielsen RG, Markoew S, et al. Reproducibility of 24-hour combined multiple intraluminal impedance (MII) and pH measurements in infants and children. Evaluation of a diagnostic procedure for gastroesophageal reflux disease. Dig Dis Sci
5. Fröhlich T, Otto S, Weber P, et al. Combined esophageal multichannel intraluminal impedance and pH monitoring after repair of esophageal atresia. J Pediatr Gastroenterol Nutr
6. van Wijk MP, Sifrim D, Rommel N, et al. Characterization of intraluminal impedance patterns associated with gas reflux in healthy volunteers. Neurogastroenterol Motil
7. van Wijk MP, Benninga MA, Omari TI. Role of the multichannel intraluminal impedance technique in infants and children. J Pediatr Gastroenterol Nutr
8. Pilic D, Fröhlich T, Nöh F et al. Detection of Gastroesophageal Reflux in Children Using Combined Multichannel Intraluminal Impedance and pH Measurement: data from the German Pediatric Impedance Group. J Pediatr
Keywords:Copyright 2011 by ESPGHAN and NASPGHAN
children; gastroesophageal reflux; interobserver agreement; multiple intraluminal impedance; pH