Glomerular disorders are the leading acquired causes of CKD and ESKD in children and young adults.1 , 2 Even with preserved kidney function, these conditions are responsible for a heavy burden of cardiovascular, metabolic, infectious, and psychologic complications. Frequently without known cause, cure, or therapies approved by the US Food and Drug Administration,3 patients with glomerular disease face uncertainty and high rates of health spending, morbidity, and mortality. Clinical advances have been stymied by the rarity of these health conditions, making identification of sufficient numbers of patients for clinical trials challenging, particularly in the pediatric setting. The resultant paucity of adequately powered clinical trials restricts our ability to produce generalizable data that can improve outcomes and inform clinical decisions.4 , 5 Manually curated patient registries and prospective cohort studies yield invaluable insights, but are resource intensive and may not be representative of the entire source population.6
The increasingly widespread adoption of electronic health records (EHRs) and emphasis on their “meaningful use” provides new opportunities to accelerate clinical research through data captured routinely at the point of care. Within pediatrics, a network called PEDSnet (pedsnet.org ) was formed in 2014 to leverage EHR data for observational and prospective clinical research.7 PEDSnet integrates EHR data from multiple institutions by mapping the source data to a standard set of data structures and definitions (i.e. , a common data model) to provide a faster, easier, and less costly infrastructure for clinical research.8 One of the essential tools for cohort identification in EHR databases is a valid computable phenotype, a specification method for identifying the clinical condition(s) of interest. A computable phenotype relies on an algorithm comprising EHR data elements and logic statements that can be executed by a computerized query of the EHR database.9
The objective of this study was to develop and evaluate the classification accuracy of an EHR-based computable phenotype algorithm to identify children and adolescents with glomerular disease. Data elements for the algorithm were selected on the basis of nephrology clinical expertise and a review of EHR data at The Children’s Hospital of Philadelphia (CHOP) from a systematic sample of pediatric nephrology patients with nephrologist-confirmed glomerular diagnoses. The final algorithm was then evaluated using data from the eight children’s hospital members of PEDSnet.
Methods
PEDSnet
PEDSnet includes eight of the nation’s largest pediatric academic health systems: CHOP, Cincinnati Children’s Hospital Medical Center, Children’s Hospital Colorado, Nemours Children’s Health System, Nationwide Children’s Hospital, St. Louis Children’s Hospital, Seattle Children’s Hospital, and Boston Children’s Hospital.10 PEDSnet has harmonized these institutions’ EHR systems (Epic, Cerner, and Allscripts) to assemble a longitudinal observational research resource comprising >6.5 million children with at least one clinical encounter and at least one coded diagnosis during or after 2009. EHR records for all children seen at a member institution since 2009 are extracted, transformed to a common data format, and merged into the data resource quarterly. Current structured data domains in the PEDSnet database include demographics; encounter data for primary care, specialty care, and emergency department visits and hospitalizations; procedures; medications prescribed; laboratory results; visit diagnoses assigned; problem lists; anthropometrics; and vital signs. Visit diagnoses and problem list entries assigned by clinicians using intelligent medical objects (IMO) clinical interface terminology are standardized to Systematized Nomenclature of Medicine, Clinical Terms (SNOMED-CT). IMO is limited by commercial licensing and cannot be used for data capture across multiple institutions. SNOMED-CT is a clinical terminology and an ontology with >100,000 unique clinical concepts. The clinical coverage of the International Classification of Diseases, Ninth Revision (ICD-9) (and even the expanded tenth revision [ICD-10]) coding systems, primarily used in medical billing, are more limited in scope and granularity. All EHR systems contain source to SNOMED-CT mappings to comply with Meaningful Use Stage 2 standards, which specifies SNOMED-CT as the preferred EHR clinical terminology.11
PEDSnet data management includes an extensive data quality assessment, which includes over 850 tests performed on data from each PEDSnet site, in each quarterly data cycle that are reviewed by data scientists at the Data Coordinating Center, housed at CHOP.12 , 13 The core data resource is implemented as a Health Insurance Portability and Accountability Act (HIPAA)-limited dataset, containing structured primary data from clinical care that includes dates and geographic areas, but excludes all direct patient identifiers. The PEDSnet data network provides a path back to the full set of institutional records for each patient, as well as to direct contact with patients and families for prospective studies.
Development of Computable Phenotype
We sought to develop a rule-based algorithm based predominantly on diagnosis and procedure codes to identify children who were diagnosed as having glomerular disease. To determine which specific data elements should be included in the computable phenotype algorithm, we conducted a systematic review of EHR data from all 569 outpatient clinic encounters for 231 patients with glomerular disease seen at the primary CHOP nephrology practice location between April and December 2013. Data abstracted included demographics, diagnoses assigned using IMO clinical interface terminology and ICD-9 and ICD-10 Clinical Modification codes, urine protein testing, prescriptions for renin-angiotensin-aldosterone system blockade and immunomodulatory medications, and kidney biopsy reports. This pilot EHR review activity was determined by the CHOP Institutional Review Board (IRB protocol number 14-011476) to meet criteria for exemption from IRB review per 45 Code of Federal Regulations (CFR) 46.101(b)(4).
Evaluation of Computable Phenotype
To assess the sensitivity of the computable phenotype, nonglomerular disease cases (hereafter referred to as noncases) were defined as patients with three or more pediatric nephrologist encounters (who did not meet the computable phenotype definition of a case). This definition was intended to enumerate the source population of children being followed by nephrology and to mitigate possible inflation of the negative predictive value (NPV) with sampling noncases from the full PEDSnet population, given the rarity of the conditions of interest in the general pediatric population.
The computable phenotype algorithm was first tested at CHOP where a random sample of 100 case records and 100 noncase records were reviewed using a standardized Research Electronic Data Capture (REDCap) chart review tool. To facilitate ease of review, create a parsimonious categorization of underlying glomerular conditions, and minimize free-text data entry, the REDCap tool included branching logic so that if a patient had undergone a kidney biopsy, the reviewer was asked to select from 14 categories of biopsy-based glomerular diagnoses or to input an “other” biopsy-based diagnosis. Similarly, if a patient did not have a kidney biopsy, the reviewer was asked to select from seven categories of glomerular diagnoses or to input an “other” diagnosis. The reviewers in this phase, two research coordinators with prior experience screening EHR data for patients with glomerular disease, were not blinded to case/noncase assignment to allow for refinement of the chart review tool and process.
The computable phenotype was then evaluated across the eight institutions in PEDSnet including CHOP. A random sample of 100 case records and 100 noncase records from each institution were reviewed by a local study investigator and/or research coordinator using the standardized REDCap chart review tool. Reviewers received training from the CHOP study team in how to perform the chart reviews using the REDCap form. The 100 case and 100 noncase records from CHOP specifically excluded the 200 records reviewed in the earlier phase testing of the phenotype at CHOP. The chart review tool was designed so that the reviewer was masked to the case/noncase assignment when reviewing the EHR.
Performance statistics (predictive values, sensitivity, specificity, and accuracy) and exact binomial confidence intervals for the eight institutions were generated using the denominator population of those with three or more nephrologist encounters. In a sensitivity analysis, we also assessed the positive predictive value (PPV) for cases identified from the full PEDSnet population without restricting by number of nephrologist encounters. False positive, false negative, and “other” diagnoses were rereviewed by the local study investigator. Statistical analyses were performed using Stata version 15.0 (StataCorp., College Station, TX). To promote wider availability and use, the code to implement the final algorithm and clinical code sets were deposited in a public repository (https://github.com/PEDSnet/GLEAN_CP_Description ), and clinical code sets were also deposited in the Value Set Authority Center repository (https://vsac.nlm.nih.gov ).
The study was approved by the IRB at CHOP (IRB protocol numbers 14–011242 and 16–012878), determined to be exempt from consent and HIPAA authorization, and utilized the PEDSnet Master Reliance Agreement process whereby the other seven participating institutions ceded IRB review to CHOP.
Results
Development of Computable Phenotype
Descriptive statistics for the single center development cohort are presented in Supplemental Table 1 . Of the 206 IMO terms assigned at 569 visits, 95 represented glomerular conditions, and 219 out of 231 (95%) individuals had at least one of these assigned to a visit encounter during the 9-month study period (71% of whom had two diagnoses or two encounters with the same diagnosis). More than half of the cohort (57%) had undergone at least one kidney biopsy, and 80% had two diagnoses or two encounters with the same diagnosis or at least one kidney biopsy. Therefore, diagnosis codes and kidney biopsy were determined to be the key data elements to include in the phenotype, and a list of 40 SNOMED-CT codes was generated on the basis of the IMO terms and associated ICD-9 Clinical Modification codes abstracted in this pilot EHR review. With 50% or less of the cohort prescribed the most commonly used antiproteinuric and immunomodulatory therapies (renin-angiotensin-aldosterone system blockade and glucocorticoids, respectively), there was concern that medications would lack sufficient sensitivity and specificity for glomerular disease. The majority of urine protein labs were dipsticks (531 out of 719 measures), and so this data element was not included in the phenotype because it was not likely to discriminate glomerular from other disorders. Therefore, the computable phenotype defined a glomerular disease case as a patient who had two or more encounters (any provider type) on different days, during which a glomerular diagnosis (from the list of 40 SNOMED-CT codes) was recorded, or who had one encounter with a diagnosis from this list assigned and a pretransplant kidney biopsy procedure, at <30 years of age. The criterion of two or more encounters on different days (in the absence of biopsy) was intended to avoid inclusion of individuals who were evaluated and “ruled out” for glomerular conditions.
Single-Center Evaluation of Computable Phenotype
When first implemented at CHOP, the computable phenotype identified glomerular disease with a PPV of 92% and NPV of 100%. Review of the eight false positive records demonstrated that SNOMED concept ID 435308 (concept name “Acute GN”) was overrepresented. The algorithm was then modified to require that this code had to be entered into the EHR by a nephrologist.
Evaluation of Computable Phenotype across PEDSnet
Evaluation at six centers revealed the performance of one of the SNOMED codes (concept name “Glomerulosclerosis”) to be an outlier, yielding a greater number of false positives than true positives. Indeed, 65 (49%) of all false positive records and only one true positive (0.2%) were identified as cases on the basis of only having this code assigned twice. All cases identified on the basis of only having this “Glomerulosclerosis” code were therefore excluded when calculating performance characteristics; for evaluation at the remaining two centers, the qualifying code list for the computable phenotype algorithm was refined to exclude this SNOMED code alone as a sufficient qualifying diagnosis (i.e. , this code was only sufficient for case identification in the presence of another qualifying SNOMED code or pretransplant biopsy).
The final computable phenotype algorithm to identify glomerular disease consisted of two or more diagnosis codes from this qualifying list on different dates or one diagnosis code and a pretransplant biopsy procedure before 30 years of age (Figure 1 ). From the >6.5 million patients in PEDSnet, the computable phenotype algorithm identified 6657 cases, 4746 of whom had three or more nephrology encounters. Cases were representative of the population with three or more nephrologist encounters in terms of sex, race, and ethnicity (Table 1 ). Cases were older at their first face-to-face visit, likely reflecting the predominantly acquired nature of glomerular disorders. As expected, individuals receiving nephrology care had considerably longer follow-up time (relationship with the health systems) relative to the full PEDSnet population.
Figure 1.: Final computable phenotype algorithm.
Table 1. -
Characteristics of source populations
Characteristic
Full PEDSnet Population, n =6,547,570
Population with Three or More Nephrology Encounters, n =60,306
Cases, n =6657
Age, (yr) at first face-to-face visit (all visits)
4.38 (0.74–11.0)
6.46 (1.16–12.04)
7.14 (3.12–12.2)
Male
3,342,960 (51%)
32,743 (54%)
3707 (56%)
Race
Asian
207,333 (3%)
1671 (3%)
336 (5%)
Black
989,107 (15%)
9796 (16%)
1011 (15%)
Multiple race
166,028 (3%)
1326 (2%)
143 (2%)
Other
652,984 (10%)
7208 (12%)
958 (14%)
Unknown
684,131 (10%)
3648 (6%)
466 (7%)
White
3,847,987 (59%)
36,657 (61%)
3743 (56%)
Hispanic/Latino
719,539 (11%)
8092 (13%)
953 (14%)
Follow-up time, yr
2.22 (0.13–6.18)
6.33 (2.40–10.2)
6.45 (2.55–9.58)
Continuous data presented as median (interquartile range) and categorical data as N (%).
The performance characteristics for the computable phenotype in the PEDSnet population with three or more nephrology encounters are shown in Table 2 . The classification accuracy was excellent at 94% with an area under the receiver operating characteristics curve (AUC) of 94%, sensitivity and NPV of ≥96%, specificity of 93%, and PPV of 89%. The PPV for the computable phenotype algorithm when applied to the full PEDSnet population was 85% (95% confidence interval [95% CI], 83% to 88%). When stratified by center, the accuracy and AUC curve remained ≥90% for all centers. The sensitivity and NPV were ≥96%, and the PPV was ≥80% for seven of the eight institutions. For one center, the computable phenotype was less sensitive (83%) but highly specific (98%), with a PPV of 96%. For another center, the computable phenotype was less specific (85%), with a PPV of 77%, but highly sensitive (100% sensitivity and 100% NPV).
Table 2. -
Performance characteristics of computable phenotype algorithm across PEDSnet population with three or more nephrology encounters
Center
Sample Cases/Noncases
Sensitivity (95% CI)
Specificity (95% CI)
PPV (95% CI)
NPV (95% CI)
Accuracy (95% CI)
AUC (95% CI)
1
87/100
96 (89 to 99)
98 (93 to 100)
98 (92 to 100)
96 (90 to 99)
97 (93 to 99)
97 (94 to 99)
2
49/100
98 (88 to 100)
96 (90 to 99)
92 (80 to 98)
99 (95 to 100)
97 (92 to 99)
97 (94 to 100)
3
74/100
100 (94 to 100)
85 (78 to 91)
77 (66 to 86)
100 (96 to 100)
90 (85 to 94)
93 (90 to 96)
4
60/100
96 (86 to 100)
89 (82 to 94)
80 (68 to 89)
98 (93 to 100)
91 (86 to 95)
93 (89 to 97)
5
74/100
97 (90 to 100)
93 (87 to 97)
91 (81 to 96)
98 (93 to 100)
95 (90 to 98)
95 (92 to 98)
6
83/99
97 (90 to 100)
89 (82 to 94)
86 (76 to 92)
98 (93 to 100)
92 (87 to 96)
93 (90 to 97)
7
46/99
83 (70 to 92)
98 (92 to 100)
96 (85 to 99)
91 (83 to 96)
92 (87 to 96)
90 (85 to 96)
8
75/100
97 (90 to 100)
95 (89 to 98)
93 (85 to 98)
98 (93 to 100)
96 (92 to 98)
96 (93 to 99)
All
548/798
96 (94 to 97)
93 (91 to 94)
89 (86 to 91)
97 (96 to 98)
94 (92 to 95)
94 (93 to 95)
AUC, area under the receiver operating characteristics curve.
Table 3 shows the distribution of glomerular conditions among the case records reviewed and adjudicated to be true positives. A total of 55% had biopsy-based conditions, nearly identical to the proportion of biopsy-based disease in the development cohort. Among true positive cases with biopsy-based conditions (n =357), the most frequent histopathologic diagnoses were FSGS (21%), minimal change nephropathy (19%), Henoch–Schönlein purpura nephritis/IgA nephropathy (18%), and lupus nephritis (15%). Among those true positive cases without a biopsy (n =292), the most common diagnosis was nephrotic syndrome accounting for 44%, followed by post-infectious GN (24%) and Henoch–Schönlein purpura nephritis/IgA nephropathy (17%).
Table 3. -
Glomerular conditions among confirmed cases (true positives) on the basis of records reviewed
Biopsy-based
N
%
FSGS
76
21.3
Minimal change
69
19.3
Minimal change
60
Minimal change with mesangial proliferation/ hypercellularity
9
Lupus nephritis
53
14.9
IgA nephropathy
48
13.5
Membranoproliferative GN/C3 glomerulopathy
32
9.0
Henoch–Schönlein purpura nephritis
17
4.8
Membranous nephropathy
12
3.4
Post-infectious GN
11
3.1
Pauci-immune GN
7
2.0
Alport syndrome
6
1.7
Thin basement membrane disease
4
1.1
Antiglomerular basement membrane disease
4
1.1
Diffuse mesangial sclerosis
3
0.8
Other
15
4.2
Total
357
No biopsy
Nephrotic syndrome
129
44.2
Post-infectious GN
71
24.3
Henoch–Schönlein purpura nephritis
34
11.6
IgA nephropathy
17
5.8
Lupus nephritis
17
5.8
Alport syndrome
10
3.4
Thin basement membrane disease
2
0.7
Other
12
4.1
Total
292
In a post hoc analysis, we used the adjudicated case records to enhance the computable phenotype algorithm to identify the subcohort of children and adolescents with idiopathic nephrotic syndrome (nephrotic syndrome without a biopsy) or biopsy-based findings of minimal change nephropathy, FSGS, or membranous nephropathy. We applied an additional criterion that for a given individual, the sum of all diagnoses from a subset of 13 SNOMED codes consistent with these disorders be greater than the sum of all diagnoses from a subset of 23 SNOMED codes indicative of GN (Supplemental Table 2 ). The classification accuracy of this algorithm for the nephrotic subcohort was 98% (95% CI, 97% to 98%). With this additional criterion we were able to identify 220 out of 233 individuals with these disorders (94% sensitivity; 95% CI, 91% to 97%). The PPV was 92% (95% CI, 88% to 95%). This approach was also highly specific for this subset of glomerular disorders (98% specificity; 95% CI, 97% to 99%).
Table 4 shows a description of the clinical characteristics of the glomerular disease cohort identified by the computable phenotype in version 2.9 of the PEDSnet data resource (as of May 2018). There were 6657 individuals with glomerular disease and longitudinal data with a median follow-up of 3.3 years that can be evaluated for retrospective observational research. Of these, 3774 individuals had a health care encounter at a member institution within the prior 18 months, and would be considered potentially recruitable into prospective studies. Table 4 also provides a description of the subcohort of the glomerular population with idiopathic nephrotic syndrome or biopsy-based diagnoses of minimal change nephropathy, FSGS, and membranous nephropathy that can be identified by the expanded criteria version of the algorithm described above. The table highlights the breadth of clinical data available, including measures of proteinuria and kidney function, dialysis and transplant outcomes, health care utilization, and therapeutic exposures.
Table 4. -
Clinical and health care utilization characteristics of the cohort identified by the computable phenotype algorithm
Characteristic
Full Population, N (%) or Median (IQR)
Subcohort: Nephrotic Syndrome, Minimal Change, FSGS, or Membranous Nephropathy, N (%) or Median (IQR)
N
6657
3315
Cohort by site
1
1241
685
2
565
259
3
844
346
4
974
540
5
513
247
6
696
376
7
583
347
8
1241
515
Age at diagnosis, yr
<2
273 (4%)
217 (7%)
2–4
1233 (19%)
912 (28%)
5–9
2028 (30%)
921 (28%)
10–14
1616 (24%)
679 (20%)
15–19
1360 (20%)
524 (16%)
≥20
147 (2%)
62 (2%)
Year of first diagnosis
<2009
1308 (20%)
712 (21%)
2009–2011
1843 (28%)
937 (28%)
2012–2014
1529 (23%)
738 (22%)
2015–2017
1801 (27%)
842 (25%)
Follow-up time since diagnosis, yr
3.3 (1.10–6.42)
3.6 (1.33–6.91)
No. of nephrology visits (per person-yr)
2.3 (0.5–7.3)
2.6 (0.5–6.9)
No. of hospitalizations (per person-yr)
0.5 (0.1–1.8)
0.4 (0–1.6)
CKD stage within ±30 d of diagnosisa
1
2077 (31%)
1186 (36%)
2
968 (15%)
380 (11%)
3
681 (10%)
248 (7%)
4
304 (5%)
92 (3%)
5
390 (6%)
128 (4%)
CKD stage at last follow-up
1
1212 (18%)
674 (20%)
2
713 (11%)
332 (10%)
3
392 (6%)
188 (6%)
4
148 (2%)
67 (2%)
5
366 (5%)
189 (6%)
No. of urine protein measures (per person-yr)
2.4 (0.4–8.4)
2.3 (0.3–7.5)
Dialysis
848 (13%)
409 (12%)
Kidney transplant
421 (6%)
185 (6%)
ACE inhibition
2635 (40%)
1262 (38%)
Angiotensin receptor blockade
759 (11%)
388 (12%)
Immunosuppression/immunomodulatorya
Corticosteroid
4049 (61%)
2535 (76%)
Mycophenolate
1613 (24%)
858 (26%)
Calcineurin inhibitor
1403 (21%)
964 (29%)
Cyclophosphamide
389 (6%)
221 (7%)
Azathioprine
435 (7%)
134 (4%)
Rituximab
254 (4%)
160 (5%)
Eculizumab
24 (<1%)
<11 (<1%)
Corticotropin
<11 (<1%)
<11 (<1%)
Abatacept
23 (<1%)
15 (<1%)
IQR, interquartile range; ACE, angiotensin-converting enzyme.
a If more than one eGFR measurement was taken within 30 d of diagnosis, the maximum was used to determine stage of CKD.
Discussion
In this study, we developed and evaluated an EHR-based computerized algorithm that identified virtually all patients with glomerular disease within the multi-institutional PEDSnet network. The revised computable phenotype algorithm identified 6657 patients with glomerular disease across the eight PEDSnet institutions, 4746 of whom had three or more nephrology encounters, with an area under the receiver operating characteristics curve of 94% (95% CI, 93% to 95%). This novel approach to rapid cohort identification could greatly enhance and accelerate comparative effectiveness, clinical trials, and health outcomes research in glomerular disease.
Glomerular disease clinical consortia and prospective observational patient registries provide invaluable contributions to our understanding of pathophysiology and the lived experience of these disorders.3 There are, however, some important limitations to prospective cohort studies and patient registries, particularly for health outcomes research. There are selection factors, both related to prespecified eligibility criteria and unmeasured variables, such as socioeconomic, computer literacy, and language barriers to participation, that often render cohort and registry participants unrepresentative of the source population. The data obtained in conventional observational research and registry approaches, while highly curated, is constrained in scope and timing by the study protocol or selected questionnaires. Furthermore, enrollment and long-term retention of sufficient numbers of participants, especially for rare diseases, are both time- and resource-intensive. The largest cohort study to date, Cure Glomerulopathy, began recruitment in 2014 and is projected to enroll 2400 patients (30% pediatric) with biopsy-based glomerular conditions across >65 centers.14 Such cohort studies could gain efficiencies through application of EHR-based cohort identification tools.
Certain limitations of conventional research methods can be complemented by the application of EHR-based cohort identification to large, real-world data resources. The cohort of children and adolescents with glomerular disease identified through this highly sensitive computable phenotype comprises nearly the entire source population with these disorders across the eight health systems in PEDSnet. The EHR-based computable phenotype method for rapid cohort identification allows for analysis of existing data from unprecedented numbers of patients. The >6000 children and adolescents with glomerular disease identified in PEDSnet and their median follow-up time of 3 years provide unique opportunities, as highlighted in Table 4 , to longitudinally capture measures of health status at the point of care and to study health service use, health outcomes, disease progression, and comparative effectiveness of alternative interventions.
The paucity of adequately powered clinical trials for glomerular disorders and resultant evidence gaps and practice pattern variation that persist15–17 call for new approaches to inform best practices and treatment decisions. The widespread adoption of certified EHRs, growing efforts to capture and aggregate real-world clinical data, and emergence of big data science provide opportunities to accelerate clinical research and outcomes improvement, particularly in rare diseases.18 , 19 In addition to opportunities for conducting large-scale observational research, the implementation of the computable phenotype in PEDSnet greatly enhances our ability to perform comparative effectiveness research to directly compare benefits and toxicities of existing therapies used for glomerular disease in children, as well as population management for quality improvement and pragmatic trials of new interventions, targeting the population seen within the past 18 months for recruitment.
EHR-based computable phenotyping is a nascent field, and our approach to EHR-based glomerular disease identification does have several limitations. The intent of the algorithm is to identify a large cohort of children with diagnosed glomerular disease and, therefore, is designed to be applied to large pediatric health care systems with nephrology subspecialty care. Although the validation results are reassuring about the performance within PEDSnet health systems, we cannot guarantee that this particular algorithm will perform as well in other databases. Given differences in referral patterns and comorbidities, the algorithm may also not perform as well in adult health care systems. EHR-based cohort identification approaches need to balance sensitivity and specificity. We prioritized sensitivity and allowed inclusion of less-specific nephritis codes, which contributed to many of the false positive records, as highlighted by the improved specificity of the enhanced computable phenotype for the nephrotic subcohort. The maximally sensitive phenotype is intended to lay the foundation for future refinements to identify specific subpopulations. Although the evaluation across eight institutions with three different EHR systems is a strength, there was response variation across institutions, which reflects real-world variation in coding practices. For example, the “Glomerulosclerosis” code was initially included in the preliminary phenotype because it was rarely used in the development center; its poor diagnostic performance was not recognized until the phenotype was evaluated across multiple other centers. Coding practices may also change over time, and the computable phenotype may need to be updated as the EHR evolves and diagnostic coding improves. There may be some referral bias as the population of children in PEDSnet access care affiliated with a tertiary care children’s hospital. However, some of the member health systems do include large primary care networks in addition to specialty care, and the eight institutions comprise >6.5 million children across 23 states. This limitation is also mitigated by the relative rarity of glomerular disease being managed by the general pediatric community as well as the pediatric nephrology workforce being largely tied to academic medical centers.20 Data capture on this cohort did not include encounters outside the eight member health care systems, such as visits to local emergency departments. Finally, as with any EHR database, data were limited to those elements that were recorded in medical records, although both data quantity and quality are expected to continue to improve as more elements are extracted and transformed into a common data format.
In summary, we used EHR data from the PEDSnet pediatric health system population of >6.5 million children to develop and evaluate a highly sensitive and specific computable phenotype to identify the largest cohort of children with glomerular disease to date. This tool for rapid cohort ascertainment applied to a robust resource of multi-institutional longitudinal EHR data holds great promise to enhance and accelerate comparative effectiveness and health outcomes research in glomerular disease and other rare diseases.
Disclosures
Dr. Denburg reports grants from Mallinckrodt Pharmaceuticals, during the conduct of the study. Dr. Dixon reports personal fees from Alexion Pharmaceuticals, personal fees from Apellis Pharmaceuticals, personal fees from Horizon Pharmaceuticals, outside the submitted work. Dr. Flynn reports personal fees from Silvergate Pharmaceuticals, other from Springer, personal fees from Ultragenyx, other from Up to Date, personal fees from Vertex Pharmaceuticals, outside the submitted work. Dr. Mariani reports grants from PCORI, during the conduct of the study; grants from Boehringer Ingelheim, other from Reata Pharmaceuticals, outside the submitted work.
Funding
This study was supported by an investigator-initiated research proposal from Mallinckrodt Pharmaceuticals (Principal Investigators: Denburg and Furth) and a grant from the Patient-Centered Outcomes Research Institute (CDRN1306-01556; Principal Investigator: Forrest). Denburg was also supported by grants from the National Institute of Diabetes and Digestive Kidney Diseases, National Institutes of Health (K23DK093556) and the Patient-Centered Outcomes Research Institute (PPRN1306-04903). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
All authors were responsible for study design. Dr. Denburg, Ms. Razzaghi, Dr. Bailey, Dr. Soranno, Dr. Pollack, Dr. Dharnidharka, Dr. Mitsnefes, Dr. Smoyer, Dr. Somers, Dr. Zaritsky, Dr. Flynn, Dr. Claes, Ms. Benton, and Dr. Forrest were responsible for data collection. Dr. Denburg, Ms. Razzaghi, Dr. Bailey, and Dr. Forrest performed data analysis. Dr. Denburg drafted the manuscript and all authors took part in manuscript revision. Dr. Denburg and Ms. Razzaghi created all figures.
We thank Dr. Debbie Gipson (Pediatric Nephrology, Mott Children’s Hospital, University of Michigan) for contributions to study design.
Supplemental Material
This article contains the following supplemental material online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2019040365/-/DCSupplemental .
Supplemental Table 1 . Characteristics of the single center cohort for the pilot EHR review to determine data elements for inclusion in the computable phenotype.
Supplemental Table 2 . SNOMED-CT codes indicative of primarily nephrotic versus nephritic conditions.
References
1. US Renal Data System: 2016 Annual Data Report: Epidemiology of Kidney Disease in the United States, Bethesda, MD, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 2016
2. Wong CJ, Moxey-Mims M, Jerry-Fluker J, Warady BA, Furth SL: CKiD (CKD in children) prospective cohort study: A review of current findings. Am J Kidney Dis 60: 1002–1011, 2012 23022429
3. Moxey-Mims MM, Flessner MF, Holzman L, Kaskel F, Sedor JR, Smoyer WE, et al.: Glomerular diseases: Registries and clinical trials. Clin J Am Soc Nephrol 11: 2234–2243, 2016 27672219
4. Inrig JK, Califf RM, Tasneem A, Vegunta RK, Molina C, Stanifer JW, et al.: The landscape of clinical trials in nephrology: A systematic review of Clinicaltrials.gov. Am J Kidney Dis 63: 771–780, 2014 24315119
5. Strippoli GF, Craig JC, Schena FP: The number, quality, and coverage of randomized controlled trials in nephrology. J Am Soc Nephrol 15: 411–419, 2004 14747388
6. Geva A, Gronsbell JL, Cai T, Cai T, Murphy SN, Lyons JC, et al.; Pediatric Pulmonary Hypertension Network and National Heart, Lung, and Blood Institute Pediatric Pulmonary Vascular Disease Outcomes Bioinformatics Clinical Coordinating Center Investigators: A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr 188: 224–231.e5, 2017
7. Forrest CB, Margolis P, Seid M, Colletti RB: PEDSnet: How a prototype pediatric learning health system is being expanded into a national network. Health Aff (Millwood) 33: 1171–1177, 2014 25006143
8. Collins FS, Hudson KL, Briggs JP, Lauer MS: PCORnet: Turning a dream into reality. J Am Med Inform Assoc 21: 576–577, 2014 24821744
9. Hripcsak G, Albers DJ: Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20: 117–121, 2013 22955496
10. Forrest CB, Margolis PA, Bailey LC, Marsolo K, Del Beccaro MA, Finkelstein JA, et al.: PEDSnet: A national pediatric learning health system. J Am Med Inform Assoc 21: 602–606, 2014 24821737
11. Office of the National Coordinator for Health Information Technology (ONC), Department of Health and Human Services: Health information technology: Standards, implementation specifications, and certification criteria for electronic health record technology, 2014 edition; Revisions to the permanent certification program for health information technology. Final rule. Fed Regist 77: 54163–54292, 2012
12. Khare R, Ruth BJ, Miller M, Tucker J, Utidjian LH, Razzaghi H, et al.: Predicting causes of data quality issues in a clinical data research network. AMIA Jt Summits Transl Sci Proc 2017: 113–121, 2018 29888053
13. Khare R, Utidjian L, Ruth BJ, Kahn MG, Burrows E, Marsolo K, et al.: A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc 24: 1072–1079, 2017 28398525
14. Mariani LH, Bomback AS, Canetta PA, Flessner MF, Helmuth M, Hladunewich MA, et al.; CureGN Consortium: CureGN study rationale, design, and methods: Establishing a large prospective observational study of glomerular disease. Am J Kidney Dis 73: 218–229, 2019 30420158
15. Beck L, Bomback AS, Choi MJ, Holzman LB, Langford C, Mariani LH, et al.: KDOQI US commentary on the 2012 KDIGO clinical practice guideline for glomerulonephritis. Am J Kidney Dis 62: 403–441, 2013 23871408
16. Kidney Disease: Improving Global Outcomes (KDIGO) Glomerulonephritis Work Group: KDIGO clinical practice guideline for glomerulonephritis. Kidney Int Suppl 2: 139–274, 2012
17. Barbour S, Beaulieu M, Gill J, Espino-Hernandez G, Reich HN, Levin A: The need for improved uptake of the KDIGO glomerulonephritis guidelines into clinical practice in Canada: A survey of nephrologists. Clin Kidney J 7: 538–545, 2014 25859369
18. Berger ML, Lipset C, Gutteridge A, Axelsen K, Subedi P, Madigan D: Optimizing the leveraging of real-world data to improve the development and use of medicines. Value Health 18: 127–130, 2015 25595243
19. Currie J: “Big data” versus “big brother”: On the appropriate use of large-scale data collections in pediatrics. Pediatrics 131[Suppl 2]: S127–S132, 2013 23547056
20. Primack WA, Meyers KE, Kirkwood SJ, Ruch-Ross HS, Radabaugh CL, Greenbaum LA: The US pediatric nephrology workforce: A report commissioned by the American Academy of Pediatrics. Am J Kidney Dis 66: 33–39, 2015 25911315