CKGROUND: The need to estimate the distance from an individual to a service provider is common in public health research. However, estimated distances are often imprecise and, we suspect, biased due to a lack of specific residential location data. In many cases, to protect subject confidentiality, data sets contain only a ZIP Code or a county.RESULTS: This paper describes an algorithm, known as "the probabilistic sampling method" (PSM), which was used to create a distribution of estimated distances to a health facility for a person whose region of residence was known, but for which demographic details and centroids were known for smaller areas within the region. From this distribution, the median distance is the most likely distance to the facility. The algorithm, using Monte Carlo sampling methods, drew a probabilistic sample of all the smaller areas (Census blocks) within each participant's reported region (ZIP Code), weighting these areas by the number of residents in the same age group as the participant. To test the PSM, we used data from a large cross-sectional study that screened women at a clinic for intimate partner violence (IPV). We had data on each woman's age and ZIP Code, but no precise residential address. We used the PSM to select a sample of census blocks, then calculated network distances from each census block's centroid to the closest IPV facility, resulting in a distribution of distances from these locations to the geocoded locations of known IPV services. We selected the median distance as the most likely distance traveled and computed confidence intervals that describe the shortest and longest distance within which any given percent of the distance estimates lie. We compared our results to those obtained using two other geocoding approaches. We show that one method overestimated the most likely distance and the other underestimated it. Neither of the alternative methods produced confidence intervals for the distance estimates. The algorithm was implemented in R code.CONCLUSIONS: The PSM has a number of benefits over traditional geocoding approaches. This methodology improves the precision of estimates of geographic access to services when complete residential address information is unavailable and, by computing the expected distribution of possible distances for any respondent and associated distance confidence limits, sensitivity analyses on distance access measures are possible. Faulty or imprecise distance measures may compromise decisions about service location and misdirect scarce resources.

ckground need estim distanc individu servic provid common public health research howev estim distanc often imprecis suspect bias due lack specif residenti locat data mani case protect subject confidenti data set contain zip code countyresult paper describ algorithm known probabilist sampl method psm use creat distribut estim distanc health facil person whose region resid known demograph detail centroid known smaller area within region distribut median distanc like distanc facil algorithm use mont carlo sampl method drew probabilist sampl smaller area census block within particip report region zip code weight area number resid age group particip test psm use data larg crosssect studi screen women clinic intim partner violenc ipv data woman age zip code precis residenti address use psm select sampl census block calcul network distanc census block centroid closest ipv facil result distribut distanc locat geocod locat known ipv servic select median distanc like distanc travel comput confid interv describ shortest longest distanc within given percent distanc estim lie compar result obtain use two geocod approach show one method overestim like distanc underestim neither altern method produc confid interv distanc estim algorithm implement r codeconclus psm number benefit tradit geocod approach methodolog improv precis estim geograph access servic complet residenti address inform unavail comput expect distribut possibl distanc respond associ distanc confid limit sensit analys distanc access measur possibl faulti imprecis distanc measur may compromis decis servic locat misdirect scarc resourc