Skip to main content
Research Lines Cardiovascular risk

A selection-bias free method to estimate the prevalence of hypertension from an administrative primary health care database in the Girona health region, Spain

Saez M, Barceló MA, Coll-de-Tuero G. Computer Methods and Programs in Biomedicine 2009; 93(3):228-240. doi: 10.1016/j.cmpb.2008.10.010 (Impact Factor: 3.424, COMPUTER SCIENCE, THEORY & METHODS  15/104 Q1)

The purpose of this study was to propose a statistical method to estimate prevalence using an administrative primary health care database. Specifically, using a two-part model (Hurdle model), we calculated the prevalence of hypertension among the population covered by public primary health care providers in the Girona health region, Spain, throughout 2005.

 

The main limitation arising from the use of administrative databases is the potential selection bias. It is a known fact that some individuals are more likely to attend primary health care centers than others and, therefore, to be included in study samples. As a result, the people who contacted such centers will be overrepresented. If the selection had been exogenous; that is, if the likelihood of an individual being observed had been identical for all individuals, it would have sufficed to weight the sample in such a way as to confer less weight to the individuals that were actually observed. However, it is very likely that these people contacted the primary health care services (from which the data were obtained), not only because they experienced some of the symptoms characterizing disease onset, but also because there could be unobserved factors influencing their use of said service that would be correlated with the unobservable factors affecting the outcome variable. In any case, the likelihood of being observed is not the same for all these patients. Thus, weighting the data (standardization) by age and sex would not correct the selection bias.

 

In this case, known as endogenous selection, a two-part model should be used, in which the first part calculates the likelihood of an individual being observed. These likelihood figures are used as weights in the second part of the model in order to correct for non‑randomness (i.e., selection bias). In this study we calculated the two parts together, applying a two-part model called Hurdle.

 

Thus, using this selection bias-free method, we were able to calculate the prevalence of hypertension, observing that 15.5% of people over the age of 15 (14.1% among men and 16.9% among women) suffer from hypertension. Similarly, the prevalence was estimated at 31.1% (30.3% among men and 32.0% among women) in people over the age of 45; at 48.3% (44.1% among men and 51.9% among women) inf those over the age of 65; and at 13.1% (11.8% among men and 13.9% among women) in the general population.