Optimisation of the Reflux-symptom Association Statistics for Use in Infants Being Investigated by 24-hour pH impedance

Omari, Taher I; Schwarzer, Andrea; vanWijk, Michiel P§; Benninga, Marc A§; McCall, Lisa*; Kritas, Stamatiki*; Koletzko, Sibylle; Davidson, Geoffrey P

Journal of Pediatric Gastroenterology & Nutrition:
doi: 10.1097/MPG.0b013e3181f474c7
Original Articles: Gastroenterology

Background and Aim: pH-impedance monitoring is used to diagnose symptomatic gastroesophageal reflux (GER) based on symptom association probability (SAP). Current criteria for calculation of SAP are optimised for heartburn in adults. Infants, however, demonstrate a different symptom profile. The aim of the present study was to optimise criteria for calculation of SAP in infants with GER disease.

Patients and Methods: Ten infants referred for investigation of symptomatic reflux were enrolled. GER episodes were recorded using a pH-impedance probe, which remained in place for 48 hours. During the test, cough, crying, and regurgitation were marked. Impedance recordings were analysed for the occurrence of bolus reflux episodes. SAP for behaviors following reflux episodes was separately calculated for day 1 and day 2 using automated reporting software, which enabled the time window used for SAP calculations to be modified from 15 to 600 seconds. Day-to-day agreement of SAP was assessed by calculating the 95% limits of agreement (mean difference ± 1.96 standard deviations of differences) and their confidence intervals.

Results: The number of bolus GER episodes and symptom episodes reported did not differ from day to day. The best agreement in SAP between the 2 days was found using time intervals of 2 minutes for cough, 5 minutes for crying, and 15 seconds and/or 2 to 5 minutes for regurgitation.

Conclusions: We conclude that the standard 2-minute time interval is appropriate for the investigation of cough and regurgitation symptoms. The day-to-day agreement of SAP for crying was poor using standard criteria, and our results suggest increasing the reflux-symptom association time interval to 5 minutes.

Author Information

*Gastroenterology Unit, Children, Youth, and Women's Health Service, North Adelaide, Australia

School of Paediatrics and Reproductive Health, University of Adelaide, Adelaide, Australia

Dr von Haunersches Kinderspital, Ludwig Maximilians University Munich, Munich, Germany

§Department of Paediatric Gastroenterology and Nutrition, Emma Children's Hospital AMC, Amsterdam, The Netherlands.

Received 12 April, 2010

Accepted 21 July, 2010

Address correspondence and reprint requests to Taher Omari, PhD, NH&MRC Senior Research Fellow, Gastroenterology Unit, Child, Youth & Women's Health Service, North Adelaide, Australia (e-mail: taher.omari@adelaide.edu.au).

Dr Taher Omari received research funding from Sandhill Scientific. Sandhill Scientific had no role in the conception and preparation of this article. This research project was funded by the Financial Markets Foundation for Children.

Dr Taher Omari is a member of the advisory board for Sandhill Scientific; Prof Sibylle Koletzko is a member of the advisory board of AstraZeneca and received a research grant and honorarium for lectures from Astra/Zeneca; the other authors report no conflicts of interest.

Article Outline

Twenty-four-hour multichannel intraluminal pH-impedance (pH-MII) allows detection of all bolus gastroesophageal reflux (GER) episodes, including gas, mixed liquid-gas, liquid, and acidic, weakly acidic, or nonacidic GER. The enhanced detection of bolus GER episodes provided by pH-MII increases the potential for identifying nonacidic infant GER as a cause of infant symptoms such as excessive irritability and crying, food refusal, cough, apnea, choking, and gagging. Many of these symptoms are not specific to GER disease (GERD) (1) and can be due to other causes, such as food allergies/intolerances, infections, or functional gastrointestinal disorders such as infantile colic or constipation (2). With evidence emerging that empirical prescription of acid suppression therapy to infants is largely ineffective and potentially harmful (3), more precise diagnostic testing offers the potential for antireflux therapy to be better targeted at patients in whom symptoms can be demonstrated to be due to acid GER and/or bolus GER.

Current guidelines from the North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition and the European Society for Pediatric Gastroenterology, Hepatology, and Nutrition (4) advocate the use of pH-MII for the investigation of symptoms such as unexplained crying and/or distressed behaviour, apnea, and apparent life-threatening events. Studies in infants, children, and adults (5,6) have characterised the effect of GER episode detection by pH-MII monitoring on diagnosis of GER symptom associations. When compared with pH metry alone, pH-MII has been consistently shown to increase the yield of patients in whom a positive GER symptom association can be demonstrated (5). The degree of GER symptom association is best defined using symptom association probability (SAP), which is derived from the statistical probability (P) that GER episodes and symptoms are temporally related using a Fisher exact test (SAP = [1 − P] × 100) (5,6).

The standard GER symptom association interval is 2 minutes and was originally based on investigations of heartburn symptoms in adults (6). The applicability of these criteria to infants who typically demonstrate a different symptom profile and in whom symptoms are reported secondhand through the observation of a parent or guardian is unknown. Furthermore, the day-to-day reproducibility of SAP calculations has not been examined in infants. These are important issues that need to be addressed for this diagnostic approach to gain wider acceptance. We therefore evaluated the effect of modifying the time window for SAP calculations and examined the day-to-day agreement of SAP in infants with typical GERD symptoms.

Back to Top | Article Outline



Ten infants (6 boys), 1.6 to 7.7 months old (median 3.5) and 4.6 to 6.7 kg (median 5.5) in weight were enrolled between August 1, 2009 and October 1, 2009. All of the infants were referred (to the Gastroenterology Unit, Children, Youth, and Women's Health Service, North Adelaide, Australia) for ambulatory 24-hour pH-MII monitoring to investigate the association of symptoms suggestive of GERD (crying, coughing, and/or regurgitation) with GER. Patients were removed from all antireflux medication for 48 hours before investigation. The study was carried out at the Children, Youth, and Women's Health Service, and ethical approval of the study protocol was obtained from the Human Research Ethics Committee of the Women's and Children's Hospital, North Adelaide.

Back to Top | Article Outline

A multichannel intraluminal impedance ambulatory data logger (Sleuth, Sandhill Scientific, Highlands Ranch, CO) was used to perform oesophageal pH-MII monitoring studies. A ComforTech Infant catheter, with 6 impedance segments (1.5 cm spacing) and pH sensor located at the distal segment, was used. After calibration and intubation, correct position (between T6 and T8) of the pH sensor was confirmed by a lateral chest x-ray. Parents/guardians were instructed to maintain normal daily routines.

Mealtimes and symptoms of “cough,” “crying,” and “regurgitation” (the latter defined as both observed vomiting and behaviour considered by the parent/guardian to be consistent with reflux entering the mouth) were recorded by the parent/guardian(s). The data logger marker buttons were preset to allow general symptoms of “cough,” “crying,” and “regurgitation” to be recorded as and when they occurred. A diary was also provided to allow recording of other information/other symptoms. For example, if a parent was unsure whether a symptom should be counted, then the detail of the event would be written in the diary rather than by using the event-marker button. When making diary entries the time on the data logger screen was recorded. At the completion of the study the information recorded on the data logger was downloaded by the study nurse and diary information was added to the study record electronically if considered relevant to the 3 main symptom categories.

Back to Top | Article Outline
pH-impedance Analysis

The tracings were divided into the first and second 24-hour period and manually analysed using semiautomated impedance analysis software (Sandhill Scientific, BioView [Billerica, MA]) to determine the occurrence of liquid and/or gas reflux. Liquid GER episodes were identified by a decrease in impedance of at least 50% from baseline and >2 seconds in duration travelling orad in the oesophageal body. The bolus clearance time was defined as the time from onset to recovery of the impedance signal recorded on the most distal impedance channel. The proximal extent of reflux was defined by the most proximal impedance channel demonstrating an impedance drop of >50% from baseline. Gas GER episodes were scored when there was a ≥50% increase in impedance from baseline observed in any 2 impedance segments, either simultaneously or sequentially orally, with at least 1 channel showing a change in impedance of >5000 Ω. Impedance-detected GER episodes that exhibited both liquid and gas characteristics were characterised as mixed GER. For each impedance-detected GER episode, the pH of the refluxate was determined by the oesophageal pH sensor. Reflux episodes were defined according to oesophageal pH as acidic (pH <4), weakly acidic (4≤ pH <7), or nonacidic (pH ≥7). Acid reflux index (% time pH <4) was determined using automated analysis of the pH tracing (GERD Check, Sandhill Scientific).

Back to Top | Article Outline
Symptom Association Analysis

For the purposes of the present study, all pH-MII–detected bolus GER events (gas, mixed, and liquid GER combined) were used in the SAP calculation. “pH only” reflux episodes, which do not exhibit the typical liquid and/or gas reflux impedance signature and therefore do not appear to be associated with refluxate of sufficient volume to fill the oesophagus, were not included in the symptom association analysis because inclusion of these events has been shown to reduce diagnostic yield (5). SAP calculations were performed using analysis software (Sandhill Scientific, BioView), which was modified by the manufacturer to allow the duration of the association time window to be manipulated. Association time windows of 15, 30, 60, 120 (standard), 300, and 600 seconds were evaluated. The following algorithm was used for SAP calculation (personal communication, J. Mabary, Sandhill Scientific): The entire procedure was divided into 2-minute segments. Segment n was tested for reflux and compared with segment n+1, which was tested for the existence of a symptom. Any segment intersecting with a reflux measurement was considered to exhibit reflux. Any segment intersecting with a symptom was considered to exhibit that symptom. Segments with multiple occurrences of symptoms of the same type were considered as one. Segments with multiple occurrences of reflux were considered as one. An association matrix was generated for each symptom type based on the totals representing all 4 Boolean combinations derived (R+/S+; R+/S−; R−/S+; R−/S−). The SAP for each symptom type was computed using the 2-tailed Fisher exact test probability [SAP = (1−P) × 100%]. The higher the SAP, the less likely that an association is coincidental.

Back to Top | Article Outline
Statistical Analysis

The degree of day-to-day agreement of SAP calculations was assessed by calculating the 95% limits of agreement (LOA) using the method of Bland and Altman (7). For each patient, the day-to-day difference in SAP was determined. The upper and lower LOA were calculated as the mean differences ± 1.96 standard deviations and their respective confidence intervals (Fig. 1). This approach provided an interval within which 95% of the differences between the 2 measurement periods are expected to lie. By calculating the LOA of SAP for the 6 different association time intervals examined, the “optimal” time window for SAP calculation was defined by the time interval that produced the least variation in the 95% limits of agreement (ie, the smallest difference between the upper and lower LOA [LOAD]). This approach was used to define optimal SAP criteria based on both the duration of the association time interval and the minimum number of symptom episodes used for calculating the SAP.

Back to Top | Article Outline


The number of bolus and the number of symptom episodes did not differ significantly from day to day (Table 1). All of the patients demonstrated symptoms of cough and crying on both days; however, symptoms of regurgitation were recorded on both days in only 6 patients.

Figure 1 shows the upper and lower LOA based on SAP calculated for the different reflux-symptom association time intervals. The LOA for cough and crying symptoms clearly converge around association intervals of 120 and 300 seconds, respectively (Fig. 1A and B). The LOA for regurgitation demonstrated a biphasic pattern, converging at 15 seconds and from 120 to 300 seconds (Fig. 1C). Figure 1D plots the LOAD for the 3 symptom types. In this case, the least LOAD indicates the interval of optimal convergence and therefore optimal agreement. Although the same pattern of convergence is demonstrated, Figure 1D also shows that the degree of agreement for crying symptoms was less overall (minimum LOAD 83.1) when compared with coughing symptoms (minimum LOAD 29.8) and regurgitation symptoms (minimum LOAD 19.2).

Figure 2 shows the further effect of symptom frequency on the LOAD by contour plotting the LOAD based on both the association time interval and the minimum number of symptoms required for inclusion of a patient in the calculation. Data are shown for crying and coughing symptoms (regurgitation not shown due to insufficient data). The level of day-to-day agreement was improved by increasing the number of symptoms. As shown in Figure 2, the time interval criteria can be further optimised by applying additional criteria of a minimum of 5 cough symptoms and 20 crying symptoms to support a reliable SAP result.

Back to Top | Article Outline


The additional diagnostic yield of pH-MII monitoring for determining GER symptom associations has been reported in infants and children (4); however, despite what appears to be an improvement in diagnostic methodology, there are no outcome studies showing that a high SAP is indeed predictive of symptomatic improvement with antireflux therapy. In the absence of these data, it is important that a SAP-based finding can be reproduced from day to day, thus ensuring confidence that the result is real. The present study investigated this in patients undergoing 48-hour pH-impedance studies. The day-to-day agreement of SAP calculations was assessed by calculating the LOA. This statistical approach allowed us to determine optimal criteria for the calculation of SAP based on symptom type, symptom frequency, and association time interval.

In the present study, we used SAP exclusively to assess the probability of a causal association between bolus GER episodes and symptoms. Symptom index (SI) and symptom sensitivity index (SSI) were not used because the effect of time interval on these measures was entirely predictable by the way these measures are calculated; any increase in the association time window will result in an increase in the number of observed associations between reflux and symptoms and therefore an increase in both the SI and SSI. As such, it is probably unremarkable that a recent study comparing SAP with SI and SSI measures demonstrated no or poor correlation between them (9).

Infant patients, of which those enrolled in the present study are typical, are usually investigated by pH-impedance monitoring due to symptoms, in circumstances in which GERD has not been otherwise diagnosed based on, for example, endoscopic findings. The fact that some of these infants may not have had GERD is not relevant to the testing of the day-to-day reliability of the SAP, which was the aim of the present study.

What has been investigated in the present study is the reproducibility of a statistical relation between reflux episodes and behaviours/symptoms. Although the test assumes that symptoms occurring following reflux are caused by reflux, this relation, particularly in circumstances of several minutes of delay, appears to have no basis in physiology because physiological responses should intuitively happen much quicker in time. Hence, although the reflux-behaviour intervals we studied were shown to have day-to-day reproducibility, the long latencies found suggest a need for further examination of the actual pathophysiological relations. The simplest way to prove this relation is to investigate the effect of antireflux therapies on patients who are deemed to be SAP positive. If there is a real pathophysiological relation, then a reduction of reflux episodes should in turn improve/reduce the number of adverse behaviours/symptoms in SAP-positive patients. This is perhaps the only evidence for a diagnosis of GERD based on pH-impedance monitoring alone.

Our study findings also need to be interpreted in the context of the vagaries of 24-hour reflux monitoring as currently applied. This test is far from perfect and is heavily influenced by the diligence of individuals charged with the task of marking symptom episodes when they occur. The optimal time window may be influenced by both the time from reflux to the onset of a symptom and the time required to press an event button after noticing the symptom. Some symptoms may be missed entirely or marking significantly delayed due to preoccupying factors. Although synchronous video monitoring would enable a more accurate assessment of behaviours associated with the onset of GER episodes in infants (8), this approach is unavailable in most centres and is time-consuming in analysis. Furthermore, in an environment in which there is still a heavy reliance on 24-hour–based acid reflux index as a diagnostic parameter, any new application of methodology for symptom association needs to be performed during the same period.

It is important to recognise that the method of SAP calculation used differs from that of Weusten et al (6), who defined positive association based on the occurrence of both reflux and symptom events within the same time window (2 minutes). In contrast, the method we have used defined association by the presence of reflux and then symptom events in consecutive time windows, thus ensuring that the order of association was always reflux symptom; however, a potential drawback of this method is that closely following behaviours, occurring within the same window, are not scored as being “associated.” We chose the latter method because this was consistent with that used by the manufacturer, and because the analysis was automated, there was no scope to alter the method of SAP calculation other than with respect to the association time window.

Our results show that the time interval used in the SAP calculation determines the level of SAP agreement from day to day. Importantly, the optimal time interval varies based on symptom type. A 2-minute time interval for SAP calculation is the current standard and our results support this interval for regurgitation and for cough symptoms. Although for regurgitation in particular a range of different intervals were optimal, our findings do not support any other interval being better than the 2-minute standard. In contrast, our results clearly support a change in criteria for a symptomatic association of GER episodes with crying. Our analysis suggests that the time interval for SAP in relation to crying symptoms should be increased from 2 to 5 minutes. Crying and its range of related behaviours (eg, irritability, fussing) are common symptoms in infants that are often considered due to GERD when all other possible causes have been excluded. Crying infants are often prescribed proton pump inhibitor therapy empirically, which has been shown to be ineffective compared with placebo (3).

The reported strength of the SAP is that it takes into account the number of time intervals with associations, the number of nonassociated intervals (reflux or symptom), and the number of “empty” intervals (neither reflux nor symptoms). The SAP is therefore less likely to be influenced by the overall number of symptoms/reflux episodes. There are, however, no published criteria with respect to the minimum number of symptoms that should be recorded to consider a SAP calculation reliable. In our own centre, we have applied criteria of a minimum of 5 symptom episodes to support a positive SAP. We analysed the effect of symptom frequency thresholds on the day-to-day agreement of the SAP. A positive SAP to regurgitation was found to be highly reproducible even with inclusion of a minimum of 2 symptom episodes. This is perhaps not surprising because true regurgitation must always follow GER. Our results suggest that parents/guardians are able to recognise this symptom well, even when it does not manifest in overt vomiting. However, because regurgitation is a relatively common occurrence in normal infants, the overall frequency of regurgitation may be more clinically relevant than findings of a positive SAP. For coughing, the day-to-day agreement of SAP markedly improved with the inclusion of patients with 5 or more symptom episodes. In contrast, the day-to-day agreement of SAP for crying was poor with 5 symptom episodes, but improved markedly with the inclusion of patients with 20 or more symptom episodes.

As with low numbers of symptoms, the contrasting situation of high numbers of symptoms needs to be considered. Although we are unable to assess this with our available data, it is true that the SAP is more likely to be positive in circumstances of extreme symptom frequency (eg, extreme crying for 90% of the recording period). Although none of the patients enrolled in the present study experienced such extreme levels of symptoms, this is an important consideration when interpreting SAP findings.

From this examination of the effect of association time interval and symptom frequency on the day-to-day agreement of SAP calculations, we conclude that the standard 2-minute interval is appropriate for the investigation of regurgitation and cough symptoms. We would further recommend that positive SAP findings be based on the observation of at least 5 symptomatic GER episodes during the period of reflux monitoring. The day-to-day agreement of SAP for crying symptoms was poor using the standard 2-minute criteria. Our results suggest an increase in the reflux-symptom association time interval from 2 to 5 minutes and, in addition, that positive SAP findings are based on the observation of at least 20 symptomatic episodes during the recording period. These findings are consistent with the fact that crying in infants is common and can be a normal behaviour. It is known that, even when considered a symptom, crying is unlikely to respond to antireflux therapy when prescribed empirically (3).

We conclude that the standard 2-minute time interval is appropriate for the investigation of cough and regurgitation symptoms. The day-to-day agreement of SAP for crying was poor using standard criteria, and our results suggest increasing the reflux-symptom association time interval to 5 minutes and recording a minimum of 20 episodes to ensure more reliable findings.

Back to Top | Article Outline


1. Hyman PE, Milla PJ, Benninga MA, et al. Childhood functional gastrointestinal disorders: neonate/toddler. Gastroenterology 2006; 130:1519–1526.
2. Sherman PM, Hassall E, Fagundes-Neto U, et al. A global, evidence-based consensus on the definition of gastroesophageal reflux disease in the pediatric population. Am J Gastroenterol 2009; 104:1278–1295.
3. Orenstein SR, Hassall E, Furmaga-Jablonska W, et al. Placebo-controlled trial assessing the efficacy and safety of proton pump inhibitor lansoprazole in infants with symptoms of GERD. J Pediatr 2009; 154:514–520.
4. Vandenplas Y, Rudolph CD, Di Lorenzo C, et al. Pediatric Gastroesophageal Reflux Clinical Practice Guidelines: joint Recommendations of the North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition (NASPGHAN) and the European Society for Pediatric Gastroenterology, Hepatology, and Nutrition (ESPGHAN). J Pediatr Gastroenterol Nutr 2009; 49:498–547.
5. Loots C, Benninga M, Davidson G, et al. Addition of pH-impedance monitoring to standard pH monitoring increases the yield of symptom association analysis in infants and children with gastroesophageal reflux. J Pediatr 2009; 154:248–252.
6. Weusten BL, Roelofs JM, Akkermans LM, et al. The symptom-association probability: an improved method for symptom analysis of 24-hour esophageal pH data. Gastroenterology 1994; 107:1741–1745.
7. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999; 8:135–160.
8. Feranchak AP, Orenstein SR, Cohn JF. Behaviors associated with onset of gastroesophageal reflux episodes in infants. Prospective study using split-screen video and pH probe. Clin Pediatr (Phila) 1994; 33:654–662.
9. Lüthold SC, Rochat MK, Bähler P. Disagreement between symptom-reflux association analysis parameters in pediatric gastroesophageal reflux disease investigation. World J Gastroenterol 2010; 16:2401–2406.

diagnosis; gastroesophageal reflux disease; infant

Copyright 2011 by ESPGHAN and NASPGHAN