Using Electronic Health Record Data to Rapidly Identify Children with Glomerular Disease for Clinical Research : Journal of the American Society of Nephrology

Journal Logo

Clinical Epidemiology

Using Electronic Health Record Data to Rapidly Identify Children with Glomerular Disease for Clinical Research

Denburg, Michelle R.; Razzaghi, Hanieh; Bailey, L. Charles; Soranno, Danielle E.; Pollack, Ari H.; Dharnidharka, Vikas R.; Mitsnefes, Mark M.; Smoyer, William E.; Somers, Michael J. G.; Zaritsky, Joshua J.; Flynn, Joseph T.; Claes, Donna J.; Dixon, Bradley P.; Benton, Maryjane; Mariani, Laura H.; Forrest, Christopher B.; Furth, Susan L.

Author Information
JASN 30(12):p 2427-2435, December 2019. | DOI: 10.1681/ASN.2019040365


Significance Statement 

Clinical advances in glomerular disease have been stymied by the rarity of these health conditions, making identification of sufficient numbers of patients with glomerular disease for enrollment in research studies challenging, particularly in the pediatric setting. We leveraged the PEDSnet pediatric health system population of >6.5 million children to develop and evaluate a highly sensitive and specific electronic health record (EHR)–based computable phenotype algorithm to identify the largest cohort of children with glomerular disease to date. This tool for rapid cohort identification applied to a robust resource of multi-institutional longitudinal EHR data offers great potential to enhance and accelerate comparative effectiveness and health outcomes research in glomerular disease.


The rarity of pediatric glomerular disease makes it difficult to identify sufficient numbers of participants for clinical trials. This leaves limited data to guide improvements in care for these patients.


The authors developed and tested an electronic health record (EHR) algorithm to identify children with glomerular disease. We used EHR data from 231 patients with glomerular disorders at a single center to develop a computerized algorithm comprising diagnosis, kidney biopsy, and transplant procedure codes. The algorithm was tested using PEDSnet, a national network of eight children’s hospitals with data on >6.5 million children. Patients with three or more nephrologist encounters (n=55,560) not meeting the computable phenotype definition of glomerular disease were defined as nonglomerular cases. A reviewer blinded to case status used a standardized form to review random samples of cases (n=800) and nonglomerular cases (n=798).


The final algorithm consisted of two or more diagnosis codes from a qualifying list or one diagnosis code and a pretransplant biopsy. Performance characteristics among the population with three or more nephrology encounters were sensitivity, 96% (95% CI, 94% to 97%); specificity, 93% (95% CI, 91% to 94%); positive predictive value (PPV), 89% (95% CI, 86% to 91%); negative predictive value, 97% (95% CI, 96% to 98%); and area under the receiver operating characteristics curve, 94% (95% CI, 93% to 95%). Requiring that the sum of nephrotic syndrome diagnosis codes exceed that of glomerulonephritis codes identified children with nephrotic syndrome or biopsy-based minimal change nephropathy, FSGS, or membranous nephropathy, with 94% sensitivity and 92% PPV. The algorithm identified 6657 children with glomerular disease across PEDSnet, ≥50% of whom were seen within 18 months.


The authors developed an EHR-based algorithm and demonstrated that it had excellent classification accuracy across PEDSnet. This tool may enable faster identification of cohorts of pediatric patients with glomerular disease for observational or prospective studies.

Visual Abstract


Glomerular disorders are the leading acquired causes of CKD and ESKD in children and young adults.1,2 Even with preserved kidney function, these conditions are responsible for a heavy burden of cardiovascular, metabolic, infectious, and psychologic complications. Frequently without known cause, cure, or therapies approved by the US Food and Drug Administration,3 patients with glomerular disease face uncertainty and high rates of health spending, morbidity, and mortality. Clinical advances have been stymied by the rarity of these health conditions, making identification of sufficient numbers of patients for clinical trials challenging, particularly in the pediatric setting. The resultant paucity of adequately powered clinical trials restricts our ability to produce generalizable data that can improve outcomes and inform clinical decisions.4,5 Manually curated patient registries and prospective cohort studies yield invaluable insights, but are resource intensive and may not be representative of the entire source population.6

The increasingly widespread adoption of electronic health records (EHRs) and emphasis on their “meaningful use” provides new opportunities to accelerate clinical research through data captured routinely at the point of care. Within pediatrics, a network called PEDSnet ( was formed in 2014 to leverage EHR data for observational and prospective clinical research.7 PEDSnet integrates EHR data from multiple institutions by mapping the source data to a standard set of data structures and definitions (i.e., a common data model) to provide a faster, easier, and less costly infrastructure for clinical research.8 One of the essential tools for cohort identification in EHR databases is a valid computable phenotype, a specification method for identifying the clinical condition(s) of interest. A computable phenotype relies on an algorithm comprising EHR data elements and logic statements that can be executed by a computerized query of the EHR database.9

The objective of this study was to develop and evaluate the classification accuracy of an EHR-based computable phenotype algorithm to identify children and adolescents with glomerular disease. Data elements for the algorithm were selected on the basis of nephrology clinical expertise and a review of EHR data at The Children’s Hospital of Philadelphia (CHOP) from a systematic sample of pediatric nephrology patients with nephrologist-confirmed glomerular diagnoses. The final algorithm was then evaluated using data from the eight children’s hospital members of PEDSnet.



PEDSnet includes eight of the nation’s largest pediatric academic health systems: CHOP, Cincinnati Children’s Hospital Medical Center, Children’s Hospital Colorado, Nemours Children’s Health System, Nationwide Children’s Hospital, St. Louis Children’s Hospital, Seattle Children’s Hospital, and Boston Children’s Hospital.10 PEDSnet has harmonized these institutions’ EHR systems (Epic, Cerner, and Allscripts) to assemble a longitudinal observational research resource comprising >6.5 million children with at least one clinical encounter and at least one coded diagnosis during or after 2009. EHR records for all children seen at a member institution since 2009 are extracted, transformed to a common data format, and merged into the data resource quarterly. Current structured data domains in the PEDSnet database include demographics; encounter data for primary care, specialty care, and emergency department visits and hospitalizations; procedures; medications prescribed; laboratory results; visit diagnoses assigned; problem lists; anthropometrics; and vital signs. Visit diagnoses and problem list entries assigned by clinicians using intelligent medical objects (IMO) clinical interface terminology are standardized to Systematized Nomenclature of Medicine, Clinical Terms (SNOMED-CT). IMO is limited by commercial licensing and cannot be used for data capture across multiple institutions. SNOMED-CT is a clinical terminology and an ontology with >100,000 unique clinical concepts. The clinical coverage of the International Classification of Diseases, Ninth Revision (ICD-9) (and even the expanded tenth revision [ICD-10]) coding systems, primarily used in medical billing, are more limited in scope and granularity. All EHR systems contain source to SNOMED-CT mappings to comply with Meaningful Use Stage 2 standards, which specifies SNOMED-CT as the preferred EHR clinical terminology.11

PEDSnet data management includes an extensive data quality assessment, which includes over 850 tests performed on data from each PEDSnet site, in each quarterly data cycle that are reviewed by data scientists at the Data Coordinating Center, housed at CHOP.12,13 The core data resource is implemented as a Health Insurance Portability and Accountability Act (HIPAA)-limited dataset, containing structured primary data from clinical care that includes dates and geographic areas, but excludes all direct patient identifiers. The PEDSnet data network provides a path back to the full set of institutional records for each patient, as well as to direct contact with patients and families for prospective studies.

Development of Computable Phenotype

We sought to develop a rule-based algorithm based predominantly on diagnosis and procedure codes to identify children who were diagnosed as having glomerular disease. To determine which specific data elements should be included in the computable phenotype algorithm, we conducted a systematic review of EHR data from all 569 outpatient clinic encounters for 231 patients with glomerular disease seen at the primary CHOP nephrology practice location between April and December 2013. Data abstracted included demographics, diagnoses assigned using IMO clinical interface terminology and ICD-9 and ICD-10 Clinical Modification codes, urine protein testing, prescriptions for renin-angiotensin-aldosterone system blockade and immunomodulatory medications, and kidney biopsy reports. This pilot EHR review activity was determined by the CHOP Institutional Review Board (IRB protocol number 14-011476) to meet criteria for exemption from IRB review per 45 Code of Federal Regulations (CFR) 46.101(b)(4).

Evaluation of Computable Phenotype

To assess the sensitivity of the computable phenotype, nonglomerular disease cases (hereafter referred to as noncases) were defined as patients with three or more pediatric nephrologist encounters (who did not meet the computable phenotype definition of a case). This definition was intended to enumerate the source population of children being followed by nephrology and to mitigate possible inflation of the negative predictive value (NPV) with sampling noncases from the full PEDSnet population, given the rarity of the conditions of interest in the general pediatric population.

The computable phenotype algorithm was first tested at CHOP where a random sample of 100 case records and 100 noncase records were reviewed using a standardized Research Electronic Data Capture (REDCap) chart review tool. To facilitate ease of review, create a parsimonious categorization of underlying glomerular conditions, and minimize free-text data entry, the REDCap tool included branching logic so that if a patient had undergone a kidney biopsy, the reviewer was asked to select from 14 categories of biopsy-based glomerular diagnoses or to input an “other” biopsy-based diagnosis. Similarly, if a patient did not have a kidney biopsy, the reviewer was asked to select from seven categories of glomerular diagnoses or to input an “other” diagnosis. The reviewers in this phase, two research coordinators with prior experience screening EHR data for patients with glomerular disease, were not blinded to case/noncase assignment to allow for refinement of the chart review tool and process.

The computable phenotype was then evaluated across the eight institutions in PEDSnet including CHOP. A random sample of 100 case records and 100 noncase records from each institution were reviewed by a local study investigator and/or research coordinator using the standardized REDCap chart review tool. Reviewers received training from the CHOP study team in how to perform the chart reviews using the REDCap form. The 100 case and 100 noncase records from CHOP specifically excluded the 200 records reviewed in the earlier phase testing of the phenotype at CHOP. The chart review tool was designed so that the reviewer was masked to the case/noncase assignment when reviewing the EHR.

Performance statistics (predictive values, sensitivity, specificity, and accuracy) and exact binomial confidence intervals for the eight institutions were generated using the denominator population of those with three or more nephrologist encounters. In a sensitivity analysis, we also assessed the positive predictive value (PPV) for cases identified from the full PEDSnet population without restricting by number of nephrologist encounters. False positive, false negative, and “other” diagnoses were rereviewed by the local study investigator. Statistical analyses were performed using Stata version 15.0 (StataCorp., College Station, TX). To promote wider availability and use, the code to implement the final algorithm and clinical code sets were deposited in a public repository (, and clinical code sets were also deposited in the Value Set Authority Center repository (

The study was approved by the IRB at CHOP (IRB protocol numbers 14–011242 and 16–012878), determined to be exempt from consent and HIPAA authorization, and utilized the PEDSnet Master Reliance Agreement process whereby the other seven participating institutions ceded IRB review to CHOP.


Development of Computable Phenotype

Descriptive statistics for the single center development cohort are presented in Supplemental Table 1. Of the 206 IMO terms assigned at 569 visits, 95 represented glomerular conditions, and 219 out of 231 (95%) individuals had at least one of these assigned to a visit encounter during the 9-month study period (71% of whom had two diagnoses or two encounters with the same diagnosis). More than half of the cohort (57%) had undergone at least one kidney biopsy, and 80% had two diagnoses or two encounters with the same diagnosis or at least one kidney biopsy. Therefore, diagnosis codes and kidney biopsy were determined to be the key data elements to include in the phenotype, and a list of 40 SNOMED-CT codes was generated on the basis of the IMO terms and associated ICD-9 Clinical Modification codes abstracted in this pilot EHR review. With 50% or less of the cohort prescribed the most commonly used antiproteinuric and immunomodulatory therapies (renin-angiotensin-aldosterone system blockade and glucocorticoids, respectively), there was concern that medications would lack sufficient sensitivity and specificity for glomerular disease. The majority of urine protein labs were dipsticks (531 out of 719 measures), and so this data element was not included in the phenotype because it was not likely to discriminate glomerular from other disorders. Therefore, the computable phenotype defined a glomerular disease case as a patient who had two or more encounters (any provider type) on different days, during which a glomerular diagnosis (from the list of 40 SNOMED-CT codes) was recorded, or who had one encounter with a diagnosis from this list assigned and a pretransplant kidney biopsy procedure, at <30 years of age. The criterion of two or more encounters on different days (in the absence of biopsy) was intended to avoid inclusion of individuals who were evaluated and “ruled out” for glomerular conditions.

Single-Center Evaluation of Computable Phenotype

When first implemented at CHOP, the computable phenotype identified glomerular disease with a PPV of 92% and NPV of 100%. Review of the eight false positive records demonstrated that SNOMED concept ID 435308 (concept name “Acute GN”) was overrepresented. The algorithm was then modified to require that this code had to be entered into the EHR by a nephrologist.

Evaluation of Computable Phenotype across PEDSnet

Evaluation at six centers revealed the performance of one of the SNOMED codes (concept name “Glomerulosclerosis”) to be an outlier, yielding a greater number of false positives than true positives. Indeed, 65 (49%) of all false positive records and only one true positive (0.2%) were identified as cases on the basis of only having this code assigned twice. All cases identified on the basis of only having this “Glomerulosclerosis” code were therefore excluded when calculating performance characteristics; for evaluation at the remaining two centers, the qualifying code list for the computable phenotype algorithm was refined to exclude this SNOMED code alone as a sufficient qualifying diagnosis (i.e., this code was only sufficient for case identification in the presence of another qualifying SNOMED code or pretransplant biopsy).

The final computable phenotype algorithm to identify glomerular disease consisted of two or more diagnosis codes from this qualifying list on different dates or one diagnosis code and a pretransplant biopsy procedure before 30 years of age (Figure 1). From the >6.5 million patients in PEDSnet, the computable phenotype algorithm identified 6657 cases, 4746 of whom had three or more nephrology encounters. Cases were representative of the population with three or more nephrologist encounters in terms of sex, race, and ethnicity (Table 1). Cases were older at their first face-to-face visit, likely reflecting the predominantly acquired nature of glomerular disorders. As expected, individuals receiving nephrology care had considerably longer follow-up time (relationship with the health systems) relative to the full PEDSnet population.

Figure 1.:
Final computable phenotype algorithm.
Table 1. - Characteristics of source populations
Characteristic Full PEDSnet Population, n=6,547,570 Population with Three or More Nephrology Encounters, n=60,306 Cases, n=6657
Age, (yr) at first face-to-face visit (all visits) 4.38 (0.74–11.0) 6.46 (1.16–12.04) 7.14 (3.12–12.2)
Male 3,342,960 (51%) 32,743 (54%) 3707 (56%)
 Asian 207,333 (3%) 1671 (3%) 336 (5%)
 Black 989,107 (15%) 9796 (16%) 1011 (15%)
 Multiple race 166,028 (3%) 1326 (2%) 143 (2%)
 Other 652,984 (10%) 7208 (12%) 958 (14%)
 Unknown 684,131 (10%) 3648 (6%) 466 (7%)
 White 3,847,987 (59%) 36,657 (61%) 3743 (56%)
 Hispanic/Latino 719,539 (11%) 8092 (13%) 953 (14%)
Follow-up time, yr 2.22 (0.13–6.18) 6.33 (2.40–10.2) 6.45 (2.55–9.58)
Continuous data presented as median (interquartile range) and categorical data as N (%).

The performance characteristics for the computable phenotype in the PEDSnet population with three or more nephrology encounters are shown in Table 2. The classification accuracy was excellent at 94% with an area under the receiver operating characteristics curve (AUC) of 94%, sensitivity and NPV of ≥96%, specificity of 93%, and PPV of 89%. The PPV for the computable phenotype algorithm when applied to the full PEDSnet population was 85% (95% confidence interval [95% CI], 83% to 88%). When stratified by center, the accuracy and AUC curve remained ≥90% for all centers. The sensitivity and NPV were ≥96%, and the PPV was ≥80% for seven of the eight institutions. For one center, the computable phenotype was less sensitive (83%) but highly specific (98%), with a PPV of 96%. For another center, the computable phenotype was less specific (85%), with a PPV of 77%, but highly sensitive (100% sensitivity and 100% NPV).

Table 2. - Performance characteristics of computable phenotype algorithm across PEDSnet population with three or more nephrology encounters
Center Sample Cases/Noncases Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) Accuracy (95% CI) AUC (95% CI)
1 87/100 96 (89 to 99) 98 (93 to 100) 98 (92 to 100) 96 (90 to 99) 97 (93 to 99) 97 (94 to 99)
2 49/100 98 (88 to 100) 96 (90 to 99) 92 (80 to 98) 99 (95 to 100) 97 (92 to 99) 97 (94 to 100)
3 74/100 100 (94 to 100) 85 (78 to 91) 77 (66 to 86) 100 (96 to 100) 90 (85 to 94) 93 (90 to 96)
4 60/100 96 (86 to 100) 89 (82 to 94) 80 (68 to 89) 98 (93 to 100) 91 (86 to 95) 93 (89 to 97)
5 74/100 97 (90 to 100) 93 (87 to 97) 91 (81 to 96) 98 (93 to 100) 95 (90 to 98) 95 (92 to 98)
6 83/99 97 (90 to 100) 89 (82 to 94) 86 (76 to 92) 98 (93 to 100) 92 (87 to 96) 93 (90 to 97)
7 46/99 83 (70 to 92) 98 (92 to 100) 96 (85 to 99) 91 (83 to 96) 92 (87 to 96) 90 (85 to 96)
8 75/100 97 (90 to 100) 95 (89 to 98) 93 (85 to 98) 98 (93 to 100) 96 (92 to 98) 96 (93 to 99)
All 548/798 96 (94 to 97) 93 (91 to 94) 89 (86 to 91) 97 (96 to 98) 94 (92 to 95) 94 (93 to 95)
AUC, area under the receiver operating characteristics curve.

Table 3 shows the distribution of glomerular conditions among the case records reviewed and adjudicated to be true positives. A total of 55% had biopsy-based conditions, nearly identical to the proportion of biopsy-based disease in the development cohort. Among true positive cases with biopsy-based conditions (n=357), the most frequent histopathologic diagnoses were FSGS (21%), minimal change nephropathy (19%), Henoch–Schönlein purpura nephritis/IgA nephropathy (18%), and lupus nephritis (15%). Among those true positive cases without a biopsy (n=292), the most common diagnosis was nephrotic syndrome accounting for 44%, followed by post-infectious GN (24%) and Henoch–Schönlein purpura nephritis/IgA nephropathy (17%).

Table 3. - Glomerular conditions among confirmed cases (true positives) on the basis of records reviewed
Biopsy-based N %
 FSGS 76 21.3
 Minimal change 69 19.3
  Minimal change 60
  Minimal change with mesangial proliferation/ hypercellularity 9
 Lupus nephritis 53 14.9
 IgA nephropathy 48 13.5
 Membranoproliferative GN/C3 glomerulopathy 32 9.0
 Henoch–Schönlein purpura nephritis 17 4.8
 Membranous nephropathy 12 3.4
 Post-infectious GN 11 3.1
 Pauci-immune GN 7 2.0
 Alport syndrome 6 1.7
 Thin basement membrane disease 4 1.1
 Antiglomerular basement membrane disease 4 1.1
 Diffuse mesangial sclerosis 3 0.8
 Other 15 4.2
 Total 357
No biopsy
 Nephrotic syndrome 129 44.2
 Post-infectious GN 71 24.3
  Henoch–Schönlein purpura nephritis 34 11.6
 IgA nephropathy 17 5.8
 Lupus nephritis 17 5.8
 Alport syndrome 10 3.4
 Thin basement membrane disease 2 0.7
 Other 12 4.1
 Total 292

In a post hoc analysis, we used the adjudicated case records to enhance the computable phenotype algorithm to identify the subcohort of children and adolescents with idiopathic nephrotic syndrome (nephrotic syndrome without a biopsy) or biopsy-based findings of minimal change nephropathy, FSGS, or membranous nephropathy. We applied an additional criterion that for a given individual, the sum of all diagnoses from a subset of 13 SNOMED codes consistent with these disorders be greater than the sum of all diagnoses from a subset of 23 SNOMED codes indicative of GN (Supplemental Table 2). The classification accuracy of this algorithm for the nephrotic subcohort was 98% (95% CI, 97% to 98%). With this additional criterion we were able to identify 220 out of 233 individuals with these disorders (94% sensitivity; 95% CI, 91% to 97%). The PPV was 92% (95% CI, 88% to 95%). This approach was also highly specific for this subset of glomerular disorders (98% specificity; 95% CI, 97% to 99%).

Table 4 shows a description of the clinical characteristics of the glomerular disease cohort identified by the computable phenotype in version 2.9 of the PEDSnet data resource (as of May 2018). There were 6657 individuals with glomerular disease and longitudinal data with a median follow-up of 3.3 years that can be evaluated for retrospective observational research. Of these, 3774 individuals had a health care encounter at a member institution within the prior 18 months, and would be considered potentially recruitable into prospective studies. Table 4 also provides a description of the subcohort of the glomerular population with idiopathic nephrotic syndrome or biopsy-based diagnoses of minimal change nephropathy, FSGS, and membranous nephropathy that can be identified by the expanded criteria version of the algorithm described above. The table highlights the breadth of clinical data available, including measures of proteinuria and kidney function, dialysis and transplant outcomes, health care utilization, and therapeutic exposures.

Table 4. - Clinical and health care utilization characteristics of the cohort identified by the computable phenotype algorithm
Characteristic Full Population, N (%) or Median (IQR) Subcohort: Nephrotic Syndrome, Minimal Change, FSGS, or Membranous Nephropathy, N (%) or Median (IQR)
N 6657 3315
Cohort by site
 1 1241 685
 2 565 259
 3 844 346
 4 974 540
 5 513 247
 6 696 376
 7 583 347
 8 1241 515
Age at diagnosis, yr
 <2 273 (4%) 217 (7%)
 2–4 1233 (19%) 912 (28%)
 5–9 2028 (30%) 921 (28%)
 10–14 1616 (24%) 679 (20%)
 15–19 1360 (20%) 524 (16%)
 ≥20 147 (2%) 62 (2%)
Year of first diagnosis
 <2009 1308 (20%) 712 (21%)
 2009–2011 1843 (28%) 937 (28%)
 2012–2014 1529 (23%) 738 (22%)
 2015–2017 1801 (27%) 842 (25%)
Follow-up time since diagnosis, yr 3.3 (1.10–6.42) 3.6 (1.33–6.91)
 No. of nephrology visits (per person-yr) 2.3 (0.5–7.3) 2.6 (0.5–6.9)
 No. of hospitalizations (per person-yr) 0.5 (0.1–1.8) 0.4 (0–1.6)
CKD stage within ±30 d of diagnosisa
 1 2077 (31%) 1186 (36%)
 2 968 (15%) 380 (11%)
 3 681 (10%) 248 (7%)
 4 304 (5%) 92 (3%)
 5 390 (6%) 128 (4%)
CKD stage at last follow-up
 1 1212 (18%) 674 (20%)
 2 713 (11%) 332 (10%)
 3 392 (6%) 188 (6%)
 4 148 (2%) 67 (2%)
 5 366 (5%) 189 (6%)
No. of urine protein measures (per person-yr) 2.4 (0.4–8.4) 2.3 (0.3–7.5)
Dialysis 848 (13%) 409 (12%)
Kidney transplant 421 (6%) 185 (6%)
ACE inhibition 2635 (40%) 1262 (38%)
Angiotensin receptor blockade 759 (11%) 388 (12%)
 Corticosteroid 4049 (61%) 2535 (76%)
 Mycophenolate 1613 (24%) 858 (26%)
 Calcineurin inhibitor 1403 (21%) 964 (29%)
 Cyclophosphamide 389 (6%) 221 (7%)
 Azathioprine 435 (7%) 134 (4%)
 Rituximab 254 (4%) 160 (5%)
 Eculizumab 24 (<1%) <11 (<1%)
 Corticotropin <11 (<1%) <11 (<1%)
 Abatacept 23 (<1%) 15 (<1%)
IQR, interquartile range; ACE, angiotensin-converting enzyme.
aIf more than one eGFR measurement was taken within 30 d of diagnosis, the maximum was used to determine stage of CKD.


In this study, we developed and evaluated an EHR-based computerized algorithm that identified virtually all patients with glomerular disease within the multi-institutional PEDSnet network. The revised computable phenotype algorithm identified 6657 patients with glomerular disease across the eight PEDSnet institutions, 4746 of whom had three or more nephrology encounters, with an area under the receiver operating characteristics curve of 94% (95% CI, 93% to 95%). This novel approach to rapid cohort identification could greatly enhance and accelerate comparative effectiveness, clinical trials, and health outcomes research in glomerular disease.

Glomerular disease clinical consortia and prospective observational patient registries provide invaluable contributions to our understanding of pathophysiology and the lived experience of these disorders.3 There are, however, some important limitations to prospective cohort studies and patient registries, particularly for health outcomes research. There are selection factors, both related to prespecified eligibility criteria and unmeasured variables, such as socioeconomic, computer literacy, and language barriers to participation, that often render cohort and registry participants unrepresentative of the source population. The data obtained in conventional observational research and registry approaches, while highly curated, is constrained in scope and timing by the study protocol or selected questionnaires. Furthermore, enrollment and long-term retention of sufficient numbers of participants, especially for rare diseases, are both time- and resource-intensive. The largest cohort study to date, Cure Glomerulopathy, began recruitment in 2014 and is projected to enroll 2400 patients (30% pediatric) with biopsy-based glomerular conditions across >65 centers.14 Such cohort studies could gain efficiencies through application of EHR-based cohort identification tools.

Certain limitations of conventional research methods can be complemented by the application of EHR-based cohort identification to large, real-world data resources. The cohort of children and adolescents with glomerular disease identified through this highly sensitive computable phenotype comprises nearly the entire source population with these disorders across the eight health systems in PEDSnet. The EHR-based computable phenotype method for rapid cohort identification allows for analysis of existing data from unprecedented numbers of patients. The >6000 children and adolescents with glomerular disease identified in PEDSnet and their median follow-up time of 3 years provide unique opportunities, as highlighted in Table 4, to longitudinally capture measures of health status at the point of care and to study health service use, health outcomes, disease progression, and comparative effectiveness of alternative interventions.

The paucity of adequately powered clinical trials for glomerular disorders and resultant evidence gaps and practice pattern variation that persist15–17 call for new approaches to inform best practices and treatment decisions. The widespread adoption of certified EHRs, growing efforts to capture and aggregate real-world clinical data, and emergence of big data science provide opportunities to accelerate clinical research and outcomes improvement, particularly in rare diseases.18,19 In addition to opportunities for conducting large-scale observational research, the implementation of the computable phenotype in PEDSnet greatly enhances our ability to perform comparative effectiveness research to directly compare benefits and toxicities of existing therapies used for glomerular disease in children, as well as population management for quality improvement and pragmatic trials of new interventions, targeting the population seen within the past 18 months for recruitment.

EHR-based computable phenotyping is a nascent field, and our approach to EHR-based glomerular disease identification does have several limitations. The intent of the algorithm is to identify a large cohort of children with diagnosed glomerular disease and, therefore, is designed to be applied to large pediatric health care systems with nephrology subspecialty care. Although the validation results are reassuring about the performance within PEDSnet health systems, we cannot guarantee that this particular algorithm will perform as well in other databases. Given differences in referral patterns and comorbidities, the algorithm may also not perform as well in adult health care systems. EHR-based cohort identification approaches need to balance sensitivity and specificity. We prioritized sensitivity and allowed inclusion of less-specific nephritis codes, which contributed to many of the false positive records, as highlighted by the improved specificity of the enhanced computable phenotype for the nephrotic subcohort. The maximally sensitive phenotype is intended to lay the foundation for future refinements to identify specific subpopulations. Although the evaluation across eight institutions with three different EHR systems is a strength, there was response variation across institutions, which reflects real-world variation in coding practices. For example, the “Glomerulosclerosis” code was initially included in the preliminary phenotype because it was rarely used in the development center; its poor diagnostic performance was not recognized until the phenotype was evaluated across multiple other centers. Coding practices may also change over time, and the computable phenotype may need to be updated as the EHR evolves and diagnostic coding improves. There may be some referral bias as the population of children in PEDSnet access care affiliated with a tertiary care children’s hospital. However, some of the member health systems do include large primary care networks in addition to specialty care, and the eight institutions comprise >6.5 million children across 23 states. This limitation is also mitigated by the relative rarity of glomerular disease being managed by the general pediatric community as well as the pediatric nephrology workforce being largely tied to academic medical centers.20 Data capture on this cohort did not include encounters outside the eight member health care systems, such as visits to local emergency departments. Finally, as with any EHR database, data were limited to those elements that were recorded in medical records, although both data quantity and quality are expected to continue to improve as more elements are extracted and transformed into a common data format.

In summary, we used EHR data from the PEDSnet pediatric health system population of >6.5 million children to develop and evaluate a highly sensitive and specific computable phenotype to identify the largest cohort of children with glomerular disease to date. This tool for rapid cohort ascertainment applied to a robust resource of multi-institutional longitudinal EHR data holds great promise to enhance and accelerate comparative effectiveness and health outcomes research in glomerular disease and other rare diseases.


Dr. Denburg reports grants from Mallinckrodt Pharmaceuticals, during the conduct of the study. Dr. Dixon reports personal fees from Alexion Pharmaceuticals, personal fees from Apellis Pharmaceuticals, personal fees from Horizon Pharmaceuticals, outside the submitted work. Dr. Flynn reports personal fees from Silvergate Pharmaceuticals, other from Springer, personal fees from Ultragenyx, other from Up to Date, personal fees from Vertex Pharmaceuticals, outside the submitted work. Dr. Mariani reports grants from PCORI, during the conduct of the study; grants from Boehringer Ingelheim, other from Reata Pharmaceuticals, outside the submitted work.


This study was supported by an investigator-initiated research proposal from Mallinckrodt Pharmaceuticals (Principal Investigators: Denburg and Furth) and a grant from the Patient-Centered Outcomes Research Institute (CDRN1306-01556; Principal Investigator: Forrest). Denburg was also supported by grants from the National Institute of Diabetes and Digestive Kidney Diseases, National Institutes of Health (K23DK093556) and the Patient-Centered Outcomes Research Institute (PPRN1306-04903). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Published online ahead of print. Publication date available at

See related editorial, “Finding That Needle in the Haystack: Computable Phenotypes,” on pages .

All authors were responsible for study design. Dr. Denburg, Ms. Razzaghi, Dr. Bailey, Dr. Soranno, Dr. Pollack, Dr. Dharnidharka, Dr. Mitsnefes, Dr. Smoyer, Dr. Somers, Dr. Zaritsky, Dr. Flynn, Dr. Claes, Ms. Benton, and Dr. Forrest were responsible for data collection. Dr. Denburg, Ms. Razzaghi, Dr. Bailey, and Dr. Forrest performed data analysis. Dr. Denburg drafted the manuscript and all authors took part in manuscript revision. Dr. Denburg and Ms. Razzaghi created all figures.

We thank Dr. Debbie Gipson (Pediatric Nephrology, Mott Children’s Hospital, University of Michigan) for contributions to study design.

Supplemental Material

This article contains the following supplemental material online at

Supplemental Table 1. Characteristics of the single center cohort for the pilot EHR review to determine data elements for inclusion in the computable phenotype.

Supplemental Table 2. SNOMED-CT codes indicative of primarily nephrotic versus nephritic conditions.


1. US Renal Data System: 2016 Annual Data Report: Epidemiology of Kidney Disease in the United States, Bethesda, MD, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 2016
2. Wong CJ, Moxey-Mims M, Jerry-Fluker J, Warady BA, Furth SL: CKiD (CKD in children) prospective cohort study: A review of current findings. Am J Kidney Dis 60: 1002–1011, 2012 23022429
3. Moxey-Mims MM, Flessner MF, Holzman L, Kaskel F, Sedor JR, Smoyer WE, et al.: Glomerular diseases: Registries and clinical trials. Clin J Am Soc Nephrol 11: 2234–2243, 2016 27672219
4. Inrig JK, Califf RM, Tasneem A, Vegunta RK, Molina C, Stanifer JW, et al.: The landscape of clinical trials in nephrology: A systematic review of Am J Kidney Dis 63: 771–780, 2014 24315119
5. Strippoli GF, Craig JC, Schena FP: The number, quality, and coverage of randomized controlled trials in nephrology. J Am Soc Nephrol 15: 411–419, 2004 14747388
6. Geva A, Gronsbell JL, Cai T, Cai T, Murphy SN, Lyons JC, et al.; Pediatric Pulmonary Hypertension Network and National Heart, Lung, and Blood Institute Pediatric Pulmonary Vascular Disease Outcomes Bioinformatics Clinical Coordinating Center Investigators: A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr 188: 224–231.e5, 2017
7. Forrest CB, Margolis P, Seid M, Colletti RB: PEDSnet: How a prototype pediatric learning health system is being expanded into a national network. Health Aff (Millwood) 33: 1171–1177, 2014 25006143
8. Collins FS, Hudson KL, Briggs JP, Lauer MS: PCORnet: Turning a dream into reality. J Am Med Inform Assoc 21: 576–577, 2014 24821744
9. Hripcsak G, Albers DJ: Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20: 117–121, 2013 22955496
10. Forrest CB, Margolis PA, Bailey LC, Marsolo K, Del Beccaro MA, Finkelstein JA, et al.: PEDSnet: A national pediatric learning health system. J Am Med Inform Assoc 21: 602–606, 2014 24821737
11. Office of the National Coordinator for Health Information Technology (ONC), Department of Health and Human Services: Health information technology: Standards, implementation specifications, and certification criteria for electronic health record technology, 2014 edition; Revisions to the permanent certification program for health information technology. Final rule. Fed Regist 77: 54163–54292, 2012
12. Khare R, Ruth BJ, Miller M, Tucker J, Utidjian LH, Razzaghi H, et al.: Predicting causes of data quality issues in a clinical data research network. AMIA Jt Summits Transl Sci Proc 2017: 113–121, 2018 29888053
13. Khare R, Utidjian L, Ruth BJ, Kahn MG, Burrows E, Marsolo K, et al.: A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc 24: 1072–1079, 2017 28398525
14. Mariani LH, Bomback AS, Canetta PA, Flessner MF, Helmuth M, Hladunewich MA, et al.; CureGN Consortium: CureGN study rationale, design, and methods: Establishing a large prospective observational study of glomerular disease. Am J Kidney Dis 73: 218–229, 2019 30420158
15. Beck L, Bomback AS, Choi MJ, Holzman LB, Langford C, Mariani LH, et al.: KDOQI US commentary on the 2012 KDIGO clinical practice guideline for glomerulonephritis. Am J Kidney Dis 62: 403–441, 2013 23871408
16. Kidney Disease: Improving Global Outcomes (KDIGO) Glomerulonephritis Work Group: KDIGO clinical practice guideline for glomerulonephritis. Kidney Int Suppl 2: 139–274, 2012
17. Barbour S, Beaulieu M, Gill J, Espino-Hernandez G, Reich HN, Levin A: The need for improved uptake of the KDIGO glomerulonephritis guidelines into clinical practice in Canada: A survey of nephrologists. Clin Kidney J 7: 538–545, 2014 25859369
18. Berger ML, Lipset C, Gutteridge A, Axelsen K, Subedi P, Madigan D: Optimizing the leveraging of real-world data to improve the development and use of medicines. Value Health 18: 127–130, 2015 25595243
19. Currie J: “Big data” versus “big brother”: On the appropriate use of large-scale data collections in pediatrics. Pediatrics 131[Suppl 2]: S127–S132, 2013 23547056
20. Primack WA, Meyers KE, Kirkwood SJ, Ruch-Ross HS, Radabaugh CL, Greenbaum LA: The US pediatric nephrology workforce: A report commissioned by the American Academy of Pediatrics. Am J Kidney Dis 66: 33–39, 2015 25911315

glomerular disease; pediatric nephrology; Epidemiology and outcomes

Copyright © 2019 by the American Society of Nephrology