J Biomed Inform - Size matters: how population size influences genotype-phenotype association studies in anonymized data.


{ data(2317) use(1299) case(1017) }
{ studi(1119) effect(1106) posit(819) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ sampl(1606) size(1419) use(1276) }
{ drug(1928) target(777) effect(648) }
{ learn(2355) train(1041) set(1003) }
{ system(1050) medic(1026) inform(1018) }
{ record(1888) medic(1808) patient(1693) }
{ data(3008) multipl(1320) sourc(1022) }
{ use(1733) differ(960) four(931) }
{ take(945) account(800) differ(722) }
{ risk(3053) factor(974) diseas(938) }
{ high(1669) rate(1365) level(1280) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ chang(1828) time(1643) increas(1301) }
{ case(1353) use(1143) diagnosi(1136) }
{ model(3480) simul(1196) paramet(876) }
{ model(3404) distribut(989) bayesian(671) }
{ framework(1458) process(801) describ(734) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ perform(1367) use(1326) method(1137) }
{ use(2086) technolog(871) perceiv(783) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ bind(1733) structur(1185) ligand(1036) }
{ featur(3375) classif(2383) classifi(1994) }
{ studi(2440) review(1878) systemat(933) }
{ concept(1167) ontolog(924) domain(897) }
{ model(2341) predict(2261) use(1141) }
{ patient(2837) hospit(1953) medic(668) }
{ group(2977) signific(1463) compar(1072) }
{ estim(2440) model(1874) function(577) }
{ activ(1452) weight(1219) physic(1104) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }


JECTIVE: Electronic medical records (EMRs) data is increasingly incorporated into genome-phenome association studies. Investigators hope to share data, but there are concerns it may be "re-identified" through the exploitation of various features, such as combinations of standardized clinical codes. Formal anonymization algorithms (e.g., k-anonymization) can prevent such violations, but prior studies suggest that the size of the population available for anonymization may influence the utility of the resulting data. We systematically investigate this issue using a large-scale biorepository and EMR system through which we evaluate the ability of researchers to learn from anonymized data for genome-phenome association studies under various conditions.METHODS: We use a k-anonymization strategy to simulate a data protection process (on data sets containing clinical codes) for resources of similar size to those found at nine academic medical institutions within the United States. Following the protection process, we replicate an existing genome-phenome association study and compare the discoveries using the protected data and the original data through the correlation (r(2)) of the p-values of association significance.RESULTS: Our investigation shows that anonymizing an entire dataset with respect to the population from which it is derived yields significantly more utility than small study-specific datasets anonymized unto themselves. When evaluated using the correlation of genome-phenome association strengths on anonymized data versus original data, all nine simulated sites, results from largest-scale anonymizations (population ~100,000) retained better utility to those on smaller sizes (population ~6000-75,000). We observed a general trend of increasing r(2) for larger data set sizes: r(2)=0.9481 for small-sized datasets, r(2)=0.9493 for moderately-sized datasets, r(2)=0.9934 for large-sized datasets.CONCLUSIONS: This research implies that regardless of the overall size of an institution's data, there may be significant benefits to anonymization of the entire EMR, even if the institution is planning on releasing only data about a specific cohort of patients.

Resumo Limpo

jectiv electron medic record emr data increas incorpor genomephenom associ studi investig hope share data concern may reidentifi exploit various featur combin standard clinic code formal anonym algorithm eg kanonym can prevent violat prior studi suggest size popul avail anonym may influenc util result data systemat investig issu use largescal biorepositori emr system evalu abil research learn anonym data genomephenom associ studi various conditionsmethod use kanonym strategi simul data protect process data set contain clinic code resourc similar size found nine academ medic institut within unit state follow protect process replic exist genomephenom associ studi compar discoveri use protect data origin data correl r pvalu associ significanceresult investig show anonym entir dataset respect popul deriv yield signific util small studyspecif dataset anonym unto evalu use correl genomephenom associ strength anonym data versus origin data nine simul site result largestscal anonym popul retain better util smaller size popul observ general trend increas r larger data set size r smallsiz dataset r moderatelys dataset r larges datasetsconclus research impli regardless overal size institut data may signific benefit anonym entir emr even institut plan releas data specif cohort patient

Resumos Similares

Comput Math Methods Med - Dynamics of high-risk nonvaccine human papillomavirus types after actual vaccination scheme. ( 0,695352813396058 )
BMC Med Inform Decis Mak - Managing protected health information in distributed research network environments: automated review to facilitate collaboration. ( 0,661941913637912 )
J. Med. Internet Res. - Guess who's not coming to dinner? Evaluating online restaurant reservations for disease surveillance. ( 0,622459501409543 )
J. Med. Internet Res. - Performance of eHealth data sources in local influenza surveillance: a 5-year open cohort study. ( 0,619046791062512 )
AMIA Annu Symp Proc - Oncoshare: lessons learned from building an integrated multi-institutional database for comparative effectiveness research. ( 0,605485130092384 )
J Am Med Inform Assoc - An electronic health record driven algorithm to identify incident antidepressant medication users. ( 0,591907958789819 )
Lifetime Data Anal - Event dependent sampling of recurrent events. ( 0,590099090408258 )
J Am Med Inform Assoc - Data use and effectiveness in electronic surveillance of healthcare associated infections in the 21st century: a systematic review. ( 0,589923628987827 )
Int J Health Geogr - Advancements in web-database applications for rabies surveillance. ( 0,586946042027131 )
J Am Med Inform Assoc - Using administrative medical claims data to supplement state disease registry systems for reporting zoonotic infections. ( 0,574691868464103 )
Comput Math Methods Med - Modeling the impact of climate change on the dynamics of Rift Valley Fever. ( 0,5733951187826 )
BMC Med Inform Decis Mak - Is it possible to identify cases of coronary artery bypass graft postoperative surgical site infection accurately from claims data? ( 0,573084319120769 )
AMIA Annu Symp Proc - A natural language processing algorithm to define a venous thromboembolism phenotype. ( 0,572442844026526 )
Wiley Interdiscip Rev Syst Biol Med - Emerging clinical applications in cardiovascular pharmacogenomics. ( 0,567360947743351 )
J. Med. Internet Res. - Scoping review on search queries and social media for disease surveillance: a chronology of innovation. ( 0,565901392746971 )
J Chem Inf Model - Drug-disease association and drug-repositioning predictions in complex diseases using causal inference-probabilistic matrix factorization. ( 0,560642479966948 )
J. Med. Internet Res. - An internet-based epidemiological investigation of the outbreak of H7N9 Avian influenza A in China since early 2013. ( 0,555615427228658 )
Spat Spatiotemporal Epidemiol - Review of methods for space-time disease surveillance. ( 0,553228461046938 )
J Am Med Inform Assoc - Assessment of administrative claims data for public health reporting of Salmonella in Tennessee. ( 0,55138369229485 )
AMIA Annu Symp Proc - Data quality and fitness for purpose of routinely collected data--a general practice case study from an electronic practice-based research network (ePBRN). ( 0,546403514606199 )
AMIA Annu Symp Proc - Root causes underlying challenges to secondary use of data. ( 0,544412456109361 )
Brief. Bioinformatics - Database identifies FDA-approved drugs with potential to be repurposed for treatment of orphan diseases. ( 0,541704634423515 )
Int J Health Geogr - Estimating the geographic distribution of human Tanapox and potential reservoirs using ecological niche modeling. ( 0,540397602963156 )
BMC Med Inform Decis Mak - Prediction of gastrointestinal disease with over-the-counter diarrheal remedy sales records in the San Francisco Bay Area. ( 0,540381018759836 )
Appl Clin Inform - Impact of implementing an EMR on physical exam documentation by ambulance personnel. ( 0,535768409153609 )
J Med Syst - A study on hepatitis disease diagnosis using probabilistic neural network. ( 0,534784671746548 )
J. Med. Internet Res. - The complex relationship of realspace events and messages in cyberspace: case study of influenza and pertussis using tweets. ( 0,534741575652981 )
J Am Med Inform Assoc - Mining local climate data to assess spatiotemporal dengue fever epidemic patterns in French Guiana. ( 0,530391345269308 )
J Biomed Inform - Using chief complaints for syndromic surveillance: a review of chief complaint based classifiers in North America. ( 0,528109226337873 )
Health Informatics J - Development of a pseudo/anonymised primary care research database: Proof-of-concept study. ( 0,52742793503204 )
Int J Health Geogr - Using Google Street View for systematic observation of the built environment: analysis of spatio-temporal instability of imagery dates. ( 0,526767478464964 )
BMC Med Inform Decis Mak - Measuring the impact of a health information exchange intervention on provider-based notifiable disease reporting using mixed methods: a study protocol. ( 0,523329226851271 )
Comput. Biol. Med. - SITDEM: a simulation tool for disease/endpoint models of association studies based on single nucleotide polymorphism genotypes. ( 0,518672659676171 )
Comput Methods Programs Biomed - Improving the work efficiency of healthcare-associated infection surveillance using electronic medical records. ( 0,518397710698751 )
J Med Syst - Stream processing health card application. ( 0,513685348888461 )
AMIA Annu Symp Proc - Coverage of rare disease names in standard terminologies and implications for patients, providers, and research. ( 0,510076366261372 )
Comput. Biol. Med. - Comparative structural modeling and docking studies of uricase: possible implication in enzyme supplementation therapy for hyperuricemic disorders. ( 0,508753793671778 )
AMIA Annu Symp Proc - Syndromic surveillance in an ICD-10 world. ( 0,50779511779147 )
BMC Med Inform Decis Mak - Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. ( 0,507423464016027 )
Lifetime Data Anal - Estimation and assessment of markov multistate models with intermittent observations on individuals. ( 0,5059651336836 )
Int J Health Geogr - Identifying risk factors for healthcare-associated infections from electronic medical record home address data. ( 0,503209199403832 )
Med Decis Making - Identification of a multistate continuous-time nonhomogeneous Markov chain model for patients with decreased renal function. ( 0,501539769684937 )
AMIA Annu Symp Proc - Wireless data collection of self-administered surveys using tablet computers. ( 0,499284661626818 )
J Biomed Inform - Towards probabilistic decision support in public health practice: predicting recent transmission of tuberculosis from patient attributes. ( 0,498728911859803 )
AMIA Annu Symp Proc - Timeliness and data element completeness of immunization data in Washington State in 2010: a comparison of data exchange methods. ( 0,496602507848997 )
Artif Intell Med - A Markov decision process approach to multi-category patient scheduling in a diagnostic facility. ( 0,492954075139465 )
BMC Med Inform Decis Mak - Electronic immunization data collection systems: application of an evaluation framework. ( 0,489827983357556 )
J Am Med Inform Assoc - A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research. ( 0,488997734329744 )
J Chem Inf Model - Enumeration of virtual libraries of combinatorial modular macrocyclic (bracelet, necklace) architectures and their linear counterparts. ( 0,488986943789749 )
Health Info Libr J - Reflections on efforts to improve medical publishing in Africa. ( 0,488833014568889 )
BMC Med Inform Decis Mak - A Bayesian spatio-temporal approach for real-time detection of disease outbreaks: a case study. ( 0,487406396861529 )
Perspect Health Inf Manag - Assessing external cause of injury coding accuracy for transport injury hospitalizations. ( 0,487078398156244 )
BMC Med Inform Decis Mak - Development of ClickClinica: a novel smartphone application to generate real-time global disease surveillance and clinical practice data. ( 0,486949162236111 )
AMIA Annu Symp Proc - Federating clinical data from six pediatric hospitals: process and initial results from the PHIS+ Consortium. ( 0,48647881822265 )
BMC Med Inform Decis Mak - Establishing a web-based integrated surveillance system for early detection of infectious disease epidemic in rural China: a field experimental study. ( 0,486092857085916 )
J Am Med Inform Assoc - A novel, privacy-preserving cryptographic approach for sharing sequencing data. ( 0,48283453880772 )
AMIA Annu Symp Proc - Investigating the semantic interoperability of laboratory data exchanged using LOINC codes in three large institutions. ( 0,481953986565157 )
Comput. Biol. Med. - Cladograms with Path to Event (ClaPTE): a novel algorithm to detect associations between genotypes or phenotypes using phylogenies. ( 0,481908023527045 )
Int J Health Geogr - Modelling typhoid risk in Dhaka metropolitan area of Bangladesh: the role of socio-economic and environmental factors. ( 0,480222806488818 )
Int J Med Inform - The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects. ( 0,479928518194975 )
J Am Med Inform Assoc - Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text. ( 0,476929152869863 )
J Am Med Inform Assoc - The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. ( 0,475677239994288 )
J Biomed Inform - Learning Bayesian networks for clinical time series analysis. ( 0,475650694033557 )
BMC Med Inform Decis Mak - HERALD (health economics using routine anonymised linked data). ( 0,47524646817194 )
AMIA Annu Symp Proc - Temporal evolution of biomedical research grant collaborations across multiple scales--a CTSA baseline study. ( 0,473194865836723 )
Med Decis Making - A balance beam aid for instruction in clinical diagnostic reasoning. ( 0,472179087469676 )
AMIA Annu Symp Proc - Piloting a deceased subject integrated data repository and protecting privacy of relatives. ( 0,471443266898917 )
J Biomed Inform - Associating co-authorship patterns with publications in high-impact journals. ( 0,47123189159696 )
J Chem Inf Model - Bacterial carbohydrate structure database 3: principles and realization. ( 0,469947545806358 )
Res Synth Methods - Fixed effects and variance components estimation in three-level meta-analysis. ( 0,469320063904919 )
Med Decis Making - Survival analysis and extrapolation modeling of time-to-event clinical trial data for economic evaluation: an alternative approach. ( 0,469155466347934 )
Int J Med Inform - Reporting systems, reporting rates and completeness of data reported from primary healthcare to a Swedish quality register--the National Diabetes Register. ( 0,468318621066101 )
Spat Spatiotemporal Epidemiol - Companion animal disease surveillance: a new solution to an old problem? ( 0,468009382939558 )
Int J Health Geogr - Pathways of neighbourhood-level socio-economic determinants of adverse birth outcomes. ( 0,467656098603664 )
J. Med. Internet Res. - Stroke experiences in weblogs: a feasibility study of sex differences. ( 0,467424204022876 )
J Am Med Inform Assoc - Leveraging biodiversity knowledge for potential phyto-therapeutic applications. ( 0,467103940915334 )
J Clin Monit Comput - Partitioning standard base excess: a new approach. ( 0,466794121969403 )
J. Comput. Biol. - Increasing power of groupwise association test with likelihood ratio test. ( 0,465658606597169 )
Med Decis Making - It's all in the name, or is it? The impact of labeling on health state values. ( 0,465646996035104 )
Comput. Biol. Med. - Preclinical evaluation and molecular docking of 4-phenyl-1-Napthyl phenyl acetamide (4P1NPA) from Streptomyces sp. DPTB16 as a potent antifungal compound. ( 0,465405118587915 )
J Biomed Inform - Federated Aggregate Cohort Estimator (FACE): an easy to deploy, vendor neutral, multi-institutional cohort query architecture. ( 0,465007604359184 )
BMC Med Inform Decis Mak - Using electronic technology to improve clinical care - results from a before-after cluster trial to evaluate assessment and classification of sick children according to Integrated Management of Childhood Illness (IMCI) protocol in Tanzania. ( 0,464652586049857 )
J. Med. Internet Res. - Health-related effects reported by electronic cigarette users in online forums. ( 0,46416166562789 )
AMIA Annu Symp Proc - Analysis of medication and indication occurrences in clinical notes. ( 0,463894628042861 )
J Biomed Inform - Pharmaceutical drugs chatter on Online Social Networks. ( 0,463093197047497 )
Med Biol Eng Comput - Fast set-up asynchronous brain-switch based on detection of foot motor imagery in 1-channel EEG. ( 0,462980970963249 )
Telemed J E Health - A new paradigm for disease surveillance in Vietnam. ( 0,461123155250653 )
J Am Med Inform Assoc - Influenza surveillance using electronic health records in the American Indian and Alaska Native population. ( 0,46041925881612 )
Int J Med Inform - Implementation and expansion of an electronic medical record for HIV care and treatment in Haiti: an assessment of system use and the impact of large-scale disruptions. ( 0,460286714415958 )
AMIA Annu Symp Proc - A new model for collaboration: building CDA documents in MDHT. ( 0,459462490816935 )
Spat Spatiotemporal Epidemiol - Spatio-temporal modeling of sparse geostatistical malaria sporozoite rate data using a zero inflated binomial model. ( 0,459182366123495 )
Int J Health Geogr - Climate change effects on Chikungunya transmission in Europe: geospatial analysis of vector's climatic suitability and virus' temperature requirements. ( 0,458807206688295 )
J Am Med Inform Assoc - Handling anticipated exceptions in clinical care: investigating clinician use of 'exit strategies' in an electronic health records system. ( 0,458194587846412 )
AMIA Annu Symp Proc - Using RxNorm and NDF-RT to classify medication data extracted from electronic health records: experiences from the Rochester Epidemiology Project. ( 0,457954275221221 )
J Am Med Inform Assoc - Examining clinical decision support integrity: is clinician self-reported data entry accurate? ( 0,45723909859289 )
J. Comput. Biol. - Elucidating influenza inhibition pathways via network reconstruction. ( 0,45658574599655 )
J Am Med Inform Assoc - Patient characteristics associated with venous thromboembolic events: a cohort study using pooled electronic health record data. ( 0,456393129350645 )
Spat Spatiotemporal Epidemiol - Accuracy of prospective space-time surveillance in detecting tuberculosis transmission. ( 0,45633270155865 )
J Biomed Inform - Leveraging concept-based approaches to identify potential phyto-therapies. ( 0,455706677526173 )
AMIA Annu Symp Proc - Mining echocardiography workflows for disease discriminative patterns. ( 0,455569096186559 )