Deprivation refers to lack of access to resources, be they material (housing, employment, education, food, etc) or social (rights associated with employment, family activities, community integration, formal participation in social institutions, leisure, and education).^{1}

Deprivation is a complex construct, difficult to conceptualize, and measure at both the individual and community levels. Usually, simple and compound indicators are used to measure deprivation. Compound indicators (“deprivation indices”) are constructed using a combination of various simple indicators.^{2,3} Deprivation indices based on data from censuses were first constructed at the beginning of the 1970s in England.^{4} The most well-known deprivation indices are those of Jarman,^{5} Carstairs and Morris,^{6,7} and Townsend et al.^{8} Deprivation measures are widely used in health-related studies, particularly in the study of health inequalities, where some deprivation indices have been constructed exclusively for this purpose.^{2,9-13}

There are several methods for calculating deprivation indices. The most commonly used fall into 5 groups: simple additive indices, weighted indices, signed χ^{2}, multivariate techniques, and finally those that use feedback among health experts.^{2-4,14-16} Multivariate construction of deprivation indices has typically been carried out using factor analysis or more simple descriptive multivariate procedures such as principal component analysis. Few Bayesian approaches have been proposed.

Principal component analysis is a widely used multivariate technique for calculating indices of deprivation.^{3,16,17} The objective is to account for the uncertainty of a set of variables with the fewest components possible. These components are constructed as linear combinations of the original variables. The principal component analysis determines the weights for each of the variables in each component.^{18}

Another widely used multivariate technique for the construction of deprivation indices is factor analysis. Just as principal component analysis can be placed in the domain of descriptive statistics, factor analysis is a statistical technique that includes the development of a model and therefore involves inference.^{18} The objective of this technique is to explain a collection of observed variables on the basis of a small number of latent variables (usually called factors) that are not directly observable or measurable. In the context of deprivation, the simple indicators are the collection of observed variables and the index is the latent variable (or common factor).

The application of factor analysis assumes that values of the observed variables are independent between every unit of study (for example, census tracts).^{19} These variables are usually assumed to follow a normal distribution. Moreover, it is assumed that the values of the factors between every unit of study are also independent. These assumptions may not always be valid when dealing with spatial data. When calculating an index using factor analysis, it is likely that there is a spatial dependence between the values of indicators in neighboring units. Consequently, the assumption of independence between areas is unlikely to be satisfied. Following the same argument, it is quite likely that the index obtained is also spatially correlated. In terms of the assumption of normality, the indicators used for the construction of indices are usually percentages, which do not follow a normal distribution.

Recently, Bayesian approaches have been developed for factor analysis using hierarchical models. The advantage of these models is that they can take into account spatial dependence within the data, and permit estimations in the factor analysis model even if the data do not follow a normal distribution. Another advantage of this approach for obtaining a deprivation index is that this index does not summarize in single values but in terms of posterior distributions.^{20,21} In this way, it is possible to obtain estimates of uncertainty in the deprivation index assigned to each spatial unit.

The spatial factor Bayesian model has been applied in various studies, although not necessarily in the context of deprivation calculation. For example, Wang and Wall^{21} applied this approach to data on mortality following various types of cancer at the county level in Minnesota to search for a spatial factor underlying cancer mortality in the state. Hogan and Tchernis^{20} proposed this model to summarize material deprivation at the level of census tract in the state of Rhode Island on the basis of several variables, taking spatial correlation into account. Abellan et al^{22} used this model with a spatial component to extract information regarding income, employment, and education at the district level in England. Congdon^{23} used this type of model to calculate indices of deprivation and fragmentation (social isolation). Tzala and Best^{24} used a similar model for the joint analysis of multiple-related diseases in space and time with a view to estimating common and disease-specific trends in cancer risk. All of these studies reported problems with identifiability for some of their parameters, which we attempt to avoid by means of our model formulation.

Uncertainty in a deprivation index is often dealt with by calculating measures of central tendency of the posterior distributions. However, ignoring uncertainty in this way could lead to bias in the results. For example, ignoring uncertainty by including the posterior mean of the deprivation index as a covariable in an ecological regression could produce a bias in the estimation of the regression parameters.

The first objective of this study is to propose a Bayesian factor analysis model to estimate a deprivation index (which takes the spatial dependence of the study units into account) and to compare this proposal with the widely used non-Bayesian principal component analysis method. The second goal is to examine the problems associated with ignoring uncertainty in the estimation of the index.

## METHODS

### Study Scope and Information Sources

This study was performed within the framework of the “Mortality in small Spanish areas and Socioeconomic and Environmental Inequalities” (MEDEA) project.^{3,25,26}

A study of cross-sectional ecological design was carried out in 3 cities on the east coast of Spain that participated in the MEDEA project: Alicante, Barcelona, and Valencia. In 2001, these cities had populations of 284,580; 1,503,884; and 738,441, respectively. The geographical units used for analysis were census tracts according to the 2001 census (222 in Alicante, 1491 in Barcelona, and 598 in Valencia). The median number of inhabitants in the census tracts was 1129 in Alicante, 923 in Barcelona, and 1135 in Valencia.

The information source for the socioeconomic indicators was the population and housing Census of 2001 (National Statistics Institute).

### Indicators for Calculating the Index

Simple socioeconomic indicators were developed for the census tracts of each city in the MEDEA project, on the basis of 2001 census information. Then, using principal component analysis, 5 socioeconomic indicators were summarized in a deprivation index, using factor loadings corresponding to the first axis as weights.^{3} This analysis was performed for each city.

In this study, we used the following 5 socioeconomic indicators that comprise the deprivation index used in the MEDEA project: (1) Unemployment: the percentage of the economically active population at least 16 years of age who are unemployed; (2) Manual workers: the percentage of the total working population at least 16 years of age who are employed as manual workers; (3) Temporary workers: the percentage of the total employed population at least 16 years of age who are employed in temporary jobs; (4) Insufficient education: the percentage of persons in the total population at least 16 years of age who are illiterate or have not completed their primary education; and (5) the proportion of young people (16-29 years of age) with insufficient education.

### Spatial Factor Bayesian Model

We performed a Bayesian factor analysis using spatial hierarchical models.^{20-24} An index was obtained that takes into account both the interrelations among the 5 socioeconomic indicators, and the spatial dependence between the census tracts. Previously the Moran and Geary tests had been carried out for each indicator to check for spatial dependence. There was strong statistical evidence (*P* < 0.001) for all indicators, regardless of whether normality was assumed, suggesting that spatial correlation exists for all of them.

Let *Y _{ij}* be the value of the indicator

*j*,

*j*= 1,...,

*m*, in the area

*i*,

*i*= 1,...,

*n*. In our case, we have 5 socioeconomic indicators (

*m*= 5) and 222 census tracts in Alicante, 1491 in Barcelona, and 598 in Valencia. We have applied the logit transformation of the indicators, ie,

because all of the indicators are percentages. With the aim of taking the spatial dependence between areas into account, the following spatial factor model was fitted separately for each city:

For each *j*, α_{j} is the model intercept. θ_{i} is the common factor (or shared component) of the 5 indicators in area *i*, ie, the value of the deprivation index for each census tract. δ_{j} are the factor loadings, which quantify the contribution of the common factor axis common to each original socioeconomic indicator. φ_{ij} is the specific residual variation of each indicator *j*. ** W** is the spatial neighborhood (adjacency) matrix for the census tracts of every city, whose dimensions are

*n*×

*n*. This matrix is defined considering that 2 census tracts are neighbors if they share a common boundary. A vague uniform prior distribution is assigned to the parameters α

_{j}and δ

_{j}. For the

*n*× 1 vectors of the common factor θ = (θ

_{1}, ..., θ

_{n})′ and the specific factors φ

_{j}= (φ

_{1j}, ..., φ

_{nj})′, we have assigned a conditional autoregressive normal distribution, more specifically a Gaussian intrinsic autoregression distribution,

^{27}to take into account the spatial dependence within the data. The variance

quantifies the variability of the heterogeneity term (or nonspatially structured variability) in the data. As Gelman^{28} suggests, a uniform prior distribution is assigned to the parameters ς_{j} and ς_{φj}.

To avoid identifiability problems between the scales of δ_{j} and θ in the model, the commonly used restriction

was applied.^{20,22,24} Identifiability (“flip-flop”) problems may also arise with the values obtained for the factor loadings (δ_{j}) and the common factor (θ), owing to the fact that equivalent solutions may be obtained simply by changing the sign of δ_{j} and θ. Thus, it would be impossible to distinguish between these 2 solutions unless one is restricted. Because in previous analysis it was observed that the indicator referring to the manual workers was the one most highly correlated with other socioeconomic indicators, and thus would be one of the indicators most strongly correlated with the common factor, it was decided to restrict the value of its factor loading (δ_{4}) to a positive value.^{20,22,24,29} In this way, we avoided a possible flip-flop effect, redefining the values of the factor loadings and the common factor obtaining δ′ and θ′ using the expressions 1 and 2. Hereafter, the factor loadings and the common factor refer to these new parameters (δ′ and θ′).

The posterior distributions were obtained by means of Monte Carlo methods based on Markov chains using the WinBUGS program (WinBUGS version 1.4.3, WinBUGS, 1989),^{30} executed from R (R version 2.7.1, R Development core Team, 2006).^{31} Three Markov chains were run with 120,000 iterations, of which the first 40,000 iterations were rejected as burn-in. Finally, only 1 of every 80 iterations was retained, with the aim of reducing the computational requirements of the simulation process. To analyze the convergence of the fitted model, the Brooks-Gelman-Rubin statistic (R-hat) and the effective sample size of the chains (n.eff) were used. The convergence criteria were R-hat <1.1 and n.eff >100 for all the simulated parameters.

The deprivation index for each census tract was obtained on the basis of the measures of central tendency of the posterior distribution of θ (specifically, the mean).

### Description of the Deprivation Index Obtained and Assessment of Its Uncertainty

With the aim of comparing the results of our proposal with those of a classic (non-Bayesian) method, we calculated the matrix of Spearman correlation coefficients among the index obtained using principal component analysis,^{3} the 5 socioeconomic indicators used, and the index obtained using the spatial factor Bayesian model. For the spatial factor Bayesian index, we obtained the posterior mean and the 95% credible interval (95% CI) of the Spearman correlation coefficient. The relationship between the principal-component-analysis index and the spatial factor Bayesian model was also represented using a scatterplot (Fig. 1).

The geographical distribution of the deprivation index (posterior mean) is represented graphically using quintile maps (Fig. 2). In addition, for each census tract the posterior probability of being assigned to the highest quintile was calculated, where a higher quintile indicates greater deprivation. This probability is represented using maps with the following cut points: [0, 0.05), [0.05, 0.5), [0.5, 0.95), and [0.95, 1]. All maps were plotted using the R statistical package (Fig. 3).^{31}

Various analyses were carried out to assess the uncertainty of the index obtained. We calculated the posterior distributions of the deprivation index for the spatial factor Bayesian model for each census tract, and from these we obtained the posterior ranks. It is possible to classify the census tracts in a determined number of quantiles, but rank uncertainty will cause each census tract to have some probability assigned to more than one quantile group. Uncertainty in the assignment of quantiles for every census tract was assessed by means of the posterior probability that the quantile for every census tract is equal to the quantile corresponding to the index posterior mean. This was done for various numbers of quantiles (3, 4, 5, and 10) to assess the sensitivity of results to this parameter.

For each census tract, we also studied the probability that the quintile group obtained in the posterior distribution of the index was equal to the quintile group obtained by means of the index posterior mean. On the x-axis of Figure 4A-C, census tracts are represented by the rank of the index posterior mean. The probability that each census tract belongs to the quintile obtained by the index posterior mean is shown on the y-axis. These analyses were also performed by comparisons with the quintile group corresponding to the principal component analysis index instead of the index posterior mean (Fig. 4D-F).

Finally, we explored the relationship between the posterior standard deviation and the posterior mean of the deprivation index (Fig. 5).

## RESULTS

Table 1 shows the posterior mean and 95% CI of the factor loadings, indicating whether the factor loadings are different from 0. In all 3 cities, the deprivation index was markedly associated with indicators of insufficient education and insufficient education among young people, and least associated with the indicator of unemployment. For example, in Valencia the factor loading for the indicator of insufficient education among young people was 0.96 (95% CI = 0.88-1.04), whereas for unemployment it was 0.37 (0.33-0.42).

Figure 2 shows quintile maps of the deprivation index by census tract in each city, obtained using the spatial factor model. The areas shown in dark gray represent the census tracts with greatest deprivation. The maps suggest a high degree of spatial correlation in the deprivation index in all the cities. However, there are census tracts in the lowest and highest levels of deprivation quintiles that are neighboring, suggesting that while the model takes the spatial dependence into account, it is flexible enough to allow for extremes of deprivation in adjacent census tracts. Moreover, the spatial distribution of these maps was similar to the distribution of the index calculated using principal component analysis,^{3} with a Spearman correlation coefficient (posterior mean) between the 2 indices of 0.96 (95% CI = 0.95-0.97) for Alicante; 0.97 (0.96-0.97) for Barcelona; and 0.97 (0.96-0.97) for Valencia (eTable, https://links.lww.com/EDE/A458). This correlation seems evident also in Figure 1, but this figure shows that the relationship between the 2 indices is not linear for any of these cities.

The posterior distribution of the deprivation index allows us to make representations such as Figure 3. This figure shows the posterior probability that each census tract is assigned to the highest quantile, representing the greatest deprivation. According to this posterior probability, the census tracts are separated into 4 groups, which may be interpreted as “high probability of deprivation,” “probably suffering from deprivation,” “probably not suffering from deprivation,” and “low probability of deprivation.” (Sections with high probability of deprivation are highlighted in dark gray in the map.) Moreover, we observe a spatial correlation between the census tracts with a high probability of deprivation, as these are not independently distributed throughout the map.

As expected, the probability that census tracts are assigned to the same quantiles using the posterior mean of the deprivation index decreases as the number of quantiles increases (Table 2). For example, in Barcelona the probability that the quintile of the deprivation index for any census tract corresponds to the quintile of its posterior mean is 0.79, whereas this value is 0.60 if we take deciles. Conversely, in Barcelona, the probability of incorrect assignment (based on the posterior mean) is 0.21 for quintiles and 0.40 for deciles. These results were very similar for all 3 cities.

Focusing on the assignment of census tracts to quintiles (Fig. 4), the probability of making a correct assignment according to the posterior mean decreased by up to approximately 0.40 in some census tracts (Fig. 4A-C). Thus, the probability of incorrect classification of a census tract in these sections, as a function of its posterior mean, is even greater than the probability of correct classification. Comparing the assignment of each census tract to quintiles based on the principal component analysis (Fig. 4D-F), we observe greater discordance, with negligible probability of agreement for some census tracts.

For example, in Alicante, we performed 3000 simulations of the deprivation index for each census tract using the spatial factor Bayesian method (posterior distribution). If we take the census tract ranked at position 100 (of 222) according to the posterior mean of the index, we note that it would be assigned to quintile 3. This census tract is ranked at position 89 according to the principal component analysis index, so by this method, this census tract also belongs to quintile 3. However, in the 3000 simulations (posterior distribution) obtained by the spatial factor Bayesian method, this census tract was assigned to quintile 2 in 518 simulations (17%), to quintile 3 in 2231 (74%), and to quintile 4 in 251 (8%). Thus, if we focus on the posterior mean and ignore assignment uncertainty, we will assign this census tract to quintile 3, ignoring the fact that this is not the quintile assigned to the census tract in 26% of the simulations.

Finally, we have observed a U-shaped relationship between the posterior mean and the posterior standard deviation of the deprivation index obtained by the spatial factor Bayesian model (Fig. 5), such that the increased posterior standard deviation is associated with relatively low and high posterior mean values. This relationship is similar for all 3 cities, and could result from the presence of outliers in the original variables. Moreover, the fact that the more influential values of the covariates in posterior ecological regression studies are those with highest uncertainty should be taken into account in this type of analysis.

## DISCUSSION

### Main Results

Bayesian Factor Analysis using spatial hierarchical models allows the calculation of deprivation indices that take into account the spatial dependence of the indicators of deprivation. Calculation of the deprivation index using summary statistics (such as the mean or median) on the basis of the posterior distributions ignores uncertainty in the index in each census tract. Failure to take this uncertainty into account could result in misclassification bias when these census tracts are grouped according to quantiles of the deprivation index.

The problem of incorrect assignment of census tracts to quantiles is exacerbated when the number of quantiles in which the areas are grouped increases. In our study, misclassification probabilities of approximately 40% were obtained when census tracts were assigned to deciles. Consequently, subsequent ecological studies may face the dilemma of choosing between an index with broad (but not very specific) groupings with reasonable probability of misclassification, or increasing the number of groupings (ie, more specific groups), accepting that this will result in greater misclassification bias. Neither of these options is very satisfactory, as our study shows.

A strong correlation was observed between the index obtained by the spatial factor Bayesian model and that created by Domínguez-Berjón et al^{3} using principal component analysis. Nevertheless, we have shown that this relationship is not linear. Such nonlinearity could modify the relationship between deprivation and some health outcomes in subsequent analyses, depending on the index considered. Moreover, we have observed considerable disagreement between the methods when the areas were grouped according to quantiles, due to the uncertainty underlying the assignment of groups. In either case, the methodology presented here is based on more realistic hypotheses, as independence between the values of the original variables would be the exception rather than the rule. Moreover, in the case in which the original variables do not precisely determine the common factor, the inclusion of spatial dependence in the model could help to estimate the common factor for all of them.

Several studies have used spatial factor Bayesian models to calculate deprivation indices, but there are no studies that compare the indices obtained using these models to those obtained using principal component analysis. In order for indices of deprivation to be as comparable as possible, we used the same indicators of deprivation with both approaches. Another contribution of our study is the reparametrization that has been used to solve the identifiability problems of the factor loadings and the common factor. In our experience, the proposed reparametrization outperforms most previously reported strategies for avoiding these identifiability problems, yielding chains with improved convergence from the simulation process. Moreover, in this paper we have paid considerable attention to the quantification of uncertainty in estimating the deprivation index. Ignoring this uncertainty in subsequent ecological regression analyses, including deprivation indices, could bias estimates of effect of this variable. Finally, the results in this paper have been obtained for 3 settings (different cities of different sizes). The results are consistent across all 3 cities, and can thus be considered to be general (at least in urban settings) regardless of the size of the region of study.

### Strengths of the Spatial Factor Bayesian Model

There are 2 cases in which taking the spatial structure of the data into account is particularly interesting: first, for analyses involving small areas, where the deprivation indicators calculated in each area may not be reliable, mainly because of their small populations,^{14} and second, when the indicators of deprivation used to calculate the index are not strongly correlated. In these 2 scenarios, it is likely that the most commonly used classic techniques estimate the deprivation index in a more diffuse manner. Conversely, the factor model we have used takes greater advantage of the spatial structure of the indicators to obtain more reliable estimates of the deprivation index.

In the majority of methods used to calculate deprivation indices, it is often assumed that the indicators of deprivation are normally distributed. Factor analysis using spatial hierarchical models can be applied to variables that do not necessarily follow a simple normal distribution, including another level in the Bayesian hierarchical model.^{21}

The model used allows us to obtain an objective criterion for deciding which indicators should make up the deprivation index. The posterior distribution of the factor loadings allows us to calculate the 95% credible interval for each factor loading, and to determine the probability that they are different from 0 and, consequently, which variables contribute the most to the common factor.

Empirical indices like that of Townsend lack a measure of uncertainty.^{8} This may be problematic when, for example, it is necessary to decide where to allocate resources on the basis of a cut-off point or the percentiles of the index.^{20} The posterior distributions of the deprivation index for each census tract allow us to make estimations that can inform policy decisions, for example, estimating the probability that a census tract belongs to a higher deprivation group. Thus, it is possible to generate maps that identify population groups with a high probability of deprivation.

### Limitations of the Spatial Factor Bayesian Model

Some authors argue that there are various dimensions or domains of deprivation,^{14} and they have endeavored to measure the multiple aspects of deprivation. To take this conceptual framework into account, it is necessary to extend the proposed model to incorporate more than one factor, although this leads to more extensive problems of identifiability than those described in this article.^{21}

## CONCLUSIONS

This study has described a new fully Bayesian methodology for calculating a deprivation index in small geographic areas. The main advantages of this approach are that it allows us to take into account the spatial dependence among simple indicators of deprivation, and that it can also be applied when these indicators do not follow a normal distribution.

As shown in this study, grouping census tracts according to quantiles of the posterior mean or median of the deprivation index ignores uncertainty in the index obtained from simulations, which results in misclassification error. Therefore, these groupings may give biased results in subsequent analyses. We suggest that the uncertainty in the estimation of the deprivation index be taken into account when this index is incorporated into subsequent analyses, such as ecological regression models.

The posterior distributions of the deprivation index also allow us to compare hypotheses and show results that would not be possible with a non-Bayesian approach. Bayesian factor analysis provides this is another tool for identifying groups with greater deprivation and for improving decision-making in the public policy planning process.

## REFERENCES

*J Epidemiol Community Health*. 2001;55:693–700.

*Gac Sanit*. 2001;15(suppl 4):23–33.

*Gac Sanit*. 2008;22:179–187.

*Int J Health Geogr*. 2007;6:17.

*Br Med J (Clin Res Ed)*. 1983;286:1705–1709.

*Health Bull (Edinb)*. 1990;48:162–175.

*J Public Health Med*. 1991;13:318–326.

*Health and Deprivation: Inequality and the North.*London: Croom Helm; 1988.

*J Epidemiol Community Health*. 2006;60:95-101.

*Annu Rev Public Health*. 1997;18:341–378.

*Med Clin (Barc)*. 1993;100:281–287.

*Desigualdades sociales en salud en España.*Madrid: Ministerio de Sanidad y Consumo; 1996.

*Environ Plann A*. 2006;38:169–185.

*J Epidemiol Community Health*. 1995;49(suppl 2):S51–S56.

*Soc Sci Med*. 2008;67:2007–2016.

*J Urban Health*. 2006;83:1041–1062.

*Applied Multivariate Statistical Analysis.*6th ed. Upper Saddle River, NJ: Prentice Hall; 2007.

*Stat Methods Appt*. 2005;14:223–241.

*J Am Stat Assoc*. 2004;99:314–324.

*Generalized common spatial factor model. Biostatistics*. 2003;4:569–582.

*Environmetrics*. 2007;18:745–758.

*J Geogr Syst*. 2008;10:217–239.

*Stat Methods Med Res*. 2008;17:97–118.

*Gac Sanit*. 2008;22:596–608.

*Health Place*. 2010;16:703-711.

*Ann Inst Stat Math*. 1991;43:1–59.

*Bayesian Data Analysis.*2nd ed. Boca Raton, Fla: Chapman & Hall/CRC; 2004.

*Rev Financ Stud*. 1996;9:557–587.

*Stat Comput*. 2000;10:325–337.