India being a country of wide diversity in health status and its determinants, previous studies have shown that spirometry standards are not uniformly applicable. The spirometry reference equations of the Global Lung Function Initiative standards (GLI) were not representative of Indians,[1,2] with the category of “other” ethnicity also shown to be inappropriate. A recent study from Western India (RESPIRE) showed large differences between the GLI equations and Western Indian equations. Reference equations from northern and eastern India revealed wide variations in predicted spirometry values.[3,4] For south India, a 1990 study from Chennai included smokers, while equations from Bangalore city, (Chhabra et al., Karnataka, part of a multicentric spirometry study), have not yet been published. Studies to derive regression equations for the rural population are few, as centers performing such studies are usually in urban areas. This study addresses the gap in spirometry equations for rural south Indians, including development and internal validation of the new equations.
A cross-sectional survey was carried out on a random sample of participants aged 30 years and above, from nine villages in rural Vellore, Tamil Nadu, south India, in 2018, to determine the prevalence of air flow obstruction, using spirometry. The detailed procedures and prevalence of air flow obstruction have been previously published. Cluster sampling was done, with 20 clusters consisting of 55 households each, to identify adults aged 30 years and above, with one person selected randomly per household. Spirometry was done using a portable spirometer (EasyOne Air, ndd Medizintechnik AG, Switzerland), according to American Thoracic Society/European Respiratory Society criteria. Height (cm) was measured with light clothing and no footwear, using the SECA 213 stadiometer, and weight (kg) with a digital scale (Phoenix, Nitiraj Engineer Ltd., India). The dataset from this survey was used to create reference standards for rural south India, excluding ever smokers, unsatisfactory spirometry efforts and subjects with known respiratory disease or symptoms (breathlessness/wheeze/cough/phlegm), using recommended criteria for selecting reference populations for spirometry.
For developing reference equations, the dataset was randomly split as 70% (development dataset) and 30% (validation dataset), with the latter used to test the equations derived from the former.
Relationships between age, weight, and height and FEV1 (Forced Expiratory Volume in one second), FVC (Forced Vital Capacity) and FEV1/FVC percent were examined for both sexes, using correlation and linear regression. Parameters with no significant association with the outcomes were removed. Quadratic terms and log transformation of outcomes were also added to the model and tested, with the final model chosen based on goodness of statistical fit. Goodness-of-fit of the regression equations was assessed using the coefficient of determination (R2). Residual analysis was performed to test assumptions of linearity, normality, heteroscedasticity, and independence of errors. Testing for gross outliers (standardized residual >3.29) and influencers (Cook’s distance, Mahanalobis distance, covariance ratios) was performed. Analysis was repeated by excluding such cases to see if R2 and beta coefficients changed significantly, and final models were those with the best fit.
The validation dataset was used to assess differences in observed and predicted values, using both the newly developed equations and previous equations from south India (Bangalore, Karnataka, unpublished, part of a multi-centric study, equations obtained from lead author),[3,6] Western India (rural Pune, Maharashtra), north India (Delhi), eastern India (Kolkata, West Bengal), as well as the Global Lung Function Initiative (GLI) reference equation for ethnicity code 5 (Other/Mixed). The differences between observed and predicted values were assessed using the paired t test. Bland Altman (BA) analysis was used to test agreement between values predicted by current and other equations, using a one sample t test and BA plots, with calculation of 95% Limits of Agreement (LOA), using the Standard Deviation (SD) of differences between the predicted values. The Lower Limit of Normal (LLN) was calculated as predicted value—1.645 Residual Standard Deviation (RSD).
The study was approved by the Institutional Review Board and Ethics Committee of a medical college and conducted in accordance with principles of the amended Declaration of Helsinki, with written informed consent from study participants.
Of 814 participants who provided satisfactory spirometry data, 115 smokers, 14 with previously diagnosed lung disease, 97 with respiratory symptoms in the last 12 months, and four with treated tuberculosis/cancer/rheumatic heart disease were excluded. Further, five subjects with absolute z score >3.29 for FEV1/FVC were excluded, yielding 583 healthy never-smoking subjects (males 214, females 369), for creating reference equations.
Descriptive characteristics of the development (70%) and validation (30%) datasets are shown in Tables 1 and 2. There were no significant differences between these datasets, in age, height, weight, and Body Mass Index (BMI) (p values for t tests >0.05, Table 2). The average age was 49.57 years for females (SD 11.27) and 51.43 years for males (SD 12.74). The two datasets were also matched for spirometry values, except for a slight difference in FEV1 for females [Table 2].
Age, height, and weight were the three significant predictors. The final regression models are shown in Table 3, chosen based on best overall fit and simplicity of use. Log transformations did not improve the fit and hence were not included in the final model. Assumptions of linearity, independence of predictors, and normality of residuals were verified and goodness of fit tests for final models showed satisfactory results.
Validation of new equations
The new equations were initially validated using the 30% validation dataset from the same study. Overall, the mean predicted and observed values were not significantly different from each other in the validation dataset, except for FEV1 in females (difference of 0.113 L, which was not considered clinically significant), Table 4.
The mean observed values in the validation data set were also compared with predicted values using previous equations from other parts of India (south, north, east, and west), Tables 5 and 6. Applying the previous south India equations (urban Bangalore) to the current rural Vellore dataset, showed good prediction of FEV1 in males, but led to overestimation of FVC [Table 5]. The Western (rural), northern, and eastern Indian equations significantly overestimated both FEV1 and FVC for this rural Vellore dataset. While the eastern India equation overestimated FEV1/FVC percent for males, the other three significantly underestimated FEV1/FVC.
For females, all four previous equations overestimated FEV1 and FVC in the rural Vellore validation dataset [Table 6]. The Western and north Indian equations underestimated FEV/FVC, with no significant differences compared to the southern and eastern equation predictions. The GLI equations using “Other” for ethnicity, predicted values of FEV1, and FVC that were higher than all Indian equations (data not shown).
Since the closest values were the previous south Indian (Bangalore) equations, Bland Altman analysis (BA) of agreement was done for predicted values using the current rural Vellore equation vs. values predicted using the Bangalore equations. Figure 1a and 1b shows BA analysis plots (average vs. difference between the two predicted values), for males and females, respectively, for FEV1. There was a small negative bias of -0.021 L for FEV1 in males (95% LOA: -0.202 L to 0.181 L); values predicted using the Vellore equations were on an average lower than previous south Indian (Bangalore) equation’s predictions, Figure 1a. The one sample t test P value was 0.078, Figure 1a. A negative bias was seen for large values of FEV1, while a positive bias was seen for smaller values of FEV1, with a significant negative correlation between the average and difference (Pearson’s r = -0.814, P value for linear regression <0.001), showing a proportional bias. In females, there was a large negative fixed bias of -0.212 L for FEV1 (95% LOA: -0.133 L to -0.291 L), one sample t test P value <0.001, indicating that the mean difference between the two sets of predicted values was significantly different from zero.
For FVC, there was a negative bias of -0.217 L (Vellore predicted values were lower than previous south Indian predictions) in males, 95% LOA: -0.589 L to 0.372 L, one sample t test P value <0.0001, Figure 2a. In females, there was a negative bias of -0.084 L for FVC (Vellore predicted values were lower than previous south Indian predictions), 95% LOA:
-0.169 L to 0.001 L, one sample t test P value <0.0001, Figure 2b.
Since the differences in the predicted FEV1/FVC ratios were not normally distributed, the predicted values from both equations were first log transformed and the BA plot plotted using log transformed predicted values,Figures 3a and 3b. Spearman’s rank correlation coefficient for differences vs. average was -0.908 for males, indicating a strong relationship between the average predicted ratio values and their differences, while it was—0.237 for females. Log transforming however did not make the differences between the two sets of predicted values normal. The BA plot for males showed poorer agreement at lower levels of the log scale (lower levels of actual FEV/FVC ratio), Figure 3a, while for females, agreement was worse at higher levels of the log scale, Figure 3b.
For males [Figure 3a], mean difference on a log scale was 0.0234, 95% LOA: 0.0042 and 0.0425, with antilog values of 1.009 and 1.102. This indicates that the Vellore predicted values for FEV/FVC percent were 1.009 to 1.102 times higher than the Bangalore predicted values, for males. For females, mean difference on a log scale was -0.0094, 95% LOA:
- 0.0004 and -0.0185, with antilog values of 0.978 and 0.958, indicating that the Vellore predicted values for FEV/FVC percent were 0.958 to 0.978 times lower than the Bangalore predicted values.
Males with airflow obstruction defined as FEV1/FVC percent below the Lower Limit of Normal (predicted value—1.645 times RSD), was 1.56% (1/64) using the Bangalore equations and 9.38% (6/64) using the new Vellore equations, in the validation data set. One male with FEV1/FVC below LLN using the Bangalore equation was also below LLN using the Vellore equation. In females, the Bangalore equations classified 6.3% (7/111) as having airflow obstruction, compared to 4.5% (5/111) using the Vellore equations. The Bangalore equation classified an additional two women as below LLN, in addition to the five also identified by the Vellore equation.
Spirometry reference equations are used to diagnose chronic respiratory diseases, by comparing observed spirometry values with those expected for individuals of the same age, sex, ethnicity, and body size. FEV1, FVC, and FEV1/FVC ratio are commonly used to diagnose obstructive and restrictive lung disease. Reference equations are created by measuring lung function in healthy individuals, and then applying techniques such as regression to predict expected values for a given set of inputs. India being a country with wide differences in disease indicators, as well as environmental and behavioral risk factors, it is unsurprising that prediction equations for spirometry from different states have shown wide differences in predictions, indicating the need for separate regional equations. Age, height, and weight were the most important predictors for lung function, with age alone influencing FEV1/FVC.[1–5] Validation is an important process in creation of new reference equations for spirometry, usually carried out in a subset of the development dataset, as well as by comparing observed with predicted values calculated using other reference equations. If observed values in a data set are similar to the predicted values from the regression equation, it shows that the equations fit the data well.
In this study from rural south India, there were wide differences between the observed values and values predicted by Western, eastern, and north India equations,[1,3,4] with the closest predictions, made by the previous southern Indian equations from Bangalore, a city 200 km from Vellore. However, the current rural Vellore equations were not in perfect agreement with the Bangalore equations, with significant bias on the BA plots. Using the Bangalore equations on the rural Vellore dataset showed satisfactory predictions for males for FEV1 but led to overestimation of FVC, as well as overestimation of both FEV1 and FVC in females. However, predicted FEV1/FVC was higher using the Vellore equations for males, leading to a higher percent of males classified as having airflow obstruction, compared to using the Bangalore equations. Thus, using the Bangalore equations would underestimate the prevalence of airflow obstruction in this population of rural males from Tamil Nadu, south India.
One reason for the poor fit of the Bangalore equations could be the urban-rural difference in the two reference populations leading to environmental differences, as major differences in biological characteristics between the populations in these locations from adjacent states is unlikely. Given the slightly higher mean height of males in Karnataka (164.6 cm) compared to Tamil Nadu (163.8 cm) from national surveys, our rural male participants from Tamil Nadu with a mean height of 162.55 cm are likely to be shorter than urban males from Bangalore, Karnataka. This could have contributed to the lower FVC in males by 0.243 L, compared to the predicted values using the Bangalore equations.
However, for females, our observed FVC values were only lower by 0.081 L from the predicted values using the Bangalore equations, which could be partly explained by similarity in height (mean height Karnataka: 152.1 cm, Tamil Nadu: 151.7, rural Vellore: 151.46 cm).
These findings could also imply that differences in adverse environmental or nutritional exposures may be greater between rural or urban males, compared to females. We speculate that domestic pollution exposures due to cooking may be adversely high in women, irrespective of rural or urban residence. However, in urban locations with clean fuel, other household members may have lower exposures. Another possibility is that nutritional advantages may first accrue to males during socio-economic transition and urbanization. This warrants the need for further explanatory research with social science elements.
A previous study found significant differences in lung function (FEV1 and FVC but not FEV1/FVC percent) between urban children from Bangalore compared to rural and semi-urban children from the same state, emphasising the role of environmental factors, leading to smaller lung sizes being a marker of adverse conditions for growth.[13,14] While this can serve as a marker for future health risks, it should not lead to discrimination. On the one hand, calling such reduced lung function normal leads to loss of our ability to identify adverse health associations, but on the other hand, referring to this as abnormal may lead to discrimination in occupations that measure fitness. Thus, it is useful to know what is usual in a given population, as well as the true normal in absence of adversity. This will ease more accurate and contextual interpretation.
This study found that the equations from rural Vellore, Tamil Nadu were not interchangeable with previous Indian equations, emphasizing the wide variations in India with respect to lung capacity of “healthy” individuals. There is a need for developing region specific standards, with adequate urban and rural representation. As recommended by the Indian Chest Society (ICS)/National College of Chest Physicians of India (NCCP), diagnosis of obstructive airway disease should ideally be based on FEV1/FVC below the LLN derived using appropriate prediction equations, from a cohort “sampled from the particular population in a geographical area.” However, in the absence of reliable, representative reference equations, the fixed GOLD criteria are recommended, understanding the risk of over-diagnosis in the elderly, and missing the diagnosis in younger groups.[16,17]
As our study was limited to adults 30 years and above, compared to others which also included 18-29 years,[1,3,4] the prediction equations would be appropriate only for those aged 30 years and above, from similar rural south Indian populations. Further research is needed to develop equations for rural south Indian young adults (18 to 30 years), as the previous study in this group was published 30 years ago.
Previous spirometry equations from south India are now decades’ old and included smokers in the study population, while a recent study from Bangalore, Karnataka is yet to be published. Application of the Bangalore equations to our data from rural Tamil Nadu significantly overestimated FEV1 and FVC in females and FVC in males, with underestimation of airflow obstruction in males, compared to the rural Vellore equations, for adults aged 30 years and older. We recommend large representative studies to develop regional prediction equations for south India, including younger and older adults. Representative studies are required across India, given the wide variations in spirometry values in “normal” due to the heterogeneity of its population.
The study was approved by the Institutional Review Board and Ethics Committee of Christian Medical College Vellore (OBSERVE 10858, dated 27.09.2017). The study was conducted in accordance with principles of the amended Declaration of Helsinki, with written informed consent from study participants.
Financial support and sponsorship
The study was partially funded by Pulmonary Medicine departmental research funds, Christian Medical College Vellore. Dr Anurag Agrawal acknowledges Wellcome Trust DBT India Alliance Senior Fellowship and funding by CSIR MLP5502 grant.
Conflicts of interest
There are no conflicts of interest.
1. Agarwal D, Parker RA, Pinnock H, Roy S, Ghorpade D, Salvi S, et al. Normal spirometry
predictive values for the Western Indian adult population. Eur Respir J 2020;56:1902129.
2. Quanger PH, Stanojevic S, Cole TJ, Baur X, Hall GL, Culver BH, et al. Multi-ethnic reference
values for spirometry
for the 3-95-yr age range:The global lung function 2012 equations. Eur Respir J 2012;40:1324–43.
3. Chhabra SK, Kumar R, Gupta U, Rahman M, Dash DJ. Prediction equations for spirometry
in adults from northern India. Indian J Chest Dis Allied Sci 2014;56:221–9.
4. Dasgupta A, Ghoshal AG, Mukhopadhyay A, Kundu S, Mukherjee S, Roychowdhury S, et al. Reference
equation for spirometry
interpretation for Eastern India. Lung India 2015;32:34–9.
5. Vijayan VK, Kuppurao KV, Venkatesan P, Sankaran K, Prabhakar R. Pulmonary function in healthy young adult Indians in Madras. Thorax 1990;45:611–5.
6. Christopher DJ, Oommen AM, George K, Shankar D, Agrawal A, Thangakunam B. Prevalence of airflow obstruction as measured by spirometry
, in rural
southern Indian adults. COPD 2020;17:128–35.
7. Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, et al. Standardisation? of spirometry
. Eur Respir J 2005;26:319–38.
8. Johannessen A, Omenaas ER, Eide GE, Bakke PS, Gulsvik A. Feasible and simple exclusion criteria for pulmonary reference
populations. Thorax 2007;62:792–8.
9. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60.
10. India State-level Disease Burden Initiative Collaborators. Nations within a nation:Variations in epidemiological transition across the states of India 1990-2016 in the Global Burden of Disease study. Lancet 2017;390:2437–60.
11. Aggarwal AN, Gupta D, Jindal SK. Comparison of Indian reference
equations for spirometry
interpretation. Respirology 2007;12:763–8.
12. Mamidi RS, Kulkarni B, Singh A. Secular trends in height in different states of india in relation to socioeconomic characteristics and dietary intakes. Food Nutr Bull 2011;32:23–34.
13. Sonnappa S, Lum S, Kirkby J, Bonner R, Wade A, Subramanya V, et al. Disparities in pulmonary function in healthy children across the Indian urban-rural
continuum. Am J Respir Crit Care Med 2015;191:79–86.
14. Agrawal A, Aggarwal M, Sonnappa S, Bush A. Ethnicity and spirometric indices:Hostage to tunnel vision?Lancet Respir Med 2019;7:743–4.
15. Gupta D, Agarwal R, Aggarwal AN, Maturu VN, Dhooria S, Prasad KT, et al. Guidelines for diagnosis and management of chronic obstructive pulmonary disease:Joint ICS/NCCP (I) recommendations. Lung India 2013;30:228–67.
16. Miller MR, Quanjer PH, Swanney MP, Ruppel G, Enright PL. Interpreting lung function data using 80% predicted and fixed thresholds misclassifies more than 20% of patients. Chest 2011;139:52–9.
17. Cerveri I, Corsico AG, Accordini S, Niniano R, Ansaldo E, Anto JM, et al. Underestimation of airflow obstruction among young adults using FEV 1/FVC<70% as a fixed cut-off:A longitudinal evaluation of clinical and functional outcomes. Thorax 2008;63:1040–5.