BMC Med Inform Decis Mak - Improved de-identification of physician notes through integrative modeling of both public and private medical text.


{ record(1888) medic(1808) patient(1693) }
{ extract(1171) text(1153) clinic(932) }
{ data(2317) use(1299) case(1017) }
{ patient(2837) hospit(1953) medic(668) }
{ data(3963) clinic(1234) research(1004) }
{ care(1570) inform(1187) nurs(1089) }
{ studi(2440) review(1878) systemat(933) }
{ detect(2391) sensit(1101) algorithm(908) }
{ patient(2315) diseas(1263) diabet(1191) }
{ model(2656) set(1616) predict(1553) }
{ inform(2794) health(2639) internet(1427) }
{ motion(1329) object(1292) video(1091) }
{ case(1353) use(1143) diagnosi(1136) }
{ monitor(1329) mobil(1314) devic(1160) }
{ structur(1116) can(940) graph(676) }
{ use(1733) differ(960) four(931) }
{ estim(2440) model(1874) function(577) }
{ featur(3375) classif(2383) classifi(1994) }
{ assess(1506) score(1403) qualiti(1306) }
{ learn(2355) train(1041) set(1003) }
{ search(2224) databas(1162) retriev(909) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ perform(1367) use(1326) method(1137) }
{ spatial(1525) area(1432) region(1030) }
{ state(1844) use(1261) util(961) }
{ drug(1928) target(777) effect(648) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ system(1976) rule(880) can(841) }
{ concept(1167) ontolog(924) domain(897) }
{ model(2220) cell(1177) simul(1124) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ ehr(2073) health(1662) electron(1139) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ data(3008) multipl(1320) sourc(1022) }
{ cancer(2502) breast(956) screen(824) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ compound(1573) activ(1297) structur(1058) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ research(1218) medic(880) student(794) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ method(2212) result(1239) propos(1039) }


CKGROUND: Physician notes routinely recorded during patient care represent a vast and underutilized resource for human disease studies on a population scale. Their use in research is primarily limited by the need to separate confidential patient information from clinical annotations, a process that is resource-intensive when performed manually. This study seeks to create an automated method for de-identifying physician notes that does not require large amounts of private information: in addition to training a model to recognize Protected Health Information (PHI) within private physician notes, we reverse the problem and train a model to recognize non-PHI words and phrases that appear in public medical texts.METHODS: Public and private medical text sources were analyzed to distinguish common medical words and phrases from Protected Health Information. Patient identifiers are generally nouns and numbers that appear infrequently in medical literature. To quantify this relationship, term frequencies and part of speech tags were compared between journal publications and physician notes. Standard medical concepts and phrases were then examined across ten medical dictionaries. Lists and rules were included from the US census database and previously published studies. In total, 28 features were used to train decision tree classifiers.RESULTS: The model successfully recalled 98% of PHI tokens from 220 discharge summaries. Cost sensitive classification was used to weight recall over precision (98%?F10 score, 76%?F1 score). More than half of the false negatives were the word "of" appearing in a hospital name. All patient names, phone numbers, and home addresses were at least partially redacted. Medical concepts such as "elevated white blood cell count" were informative for de-identification. The results exceed the previously approved criteria established by four Institutional Review Boards.CONCLUSIONS: The results indicate that distributional differences between private and public medical text can be used to accurately classify PHI. The data and algorithms reported here are made freely available for evaluation and improvement.

Resumo Limpo

ckground physician note routin record patient care repres vast underutil resourc human diseas studi popul scale use research primarili limit need separ confidenti patient inform clinic annot process resourceintens perform manual studi seek creat autom method deidentifi physician note requir larg amount privat inform addit train model recogn protect health inform phi within privat physician note revers problem train model recogn nonphi word phrase appear public medic textsmethod public privat medic text sourc analyz distinguish common medic word phrase protect health inform patient identifi general noun number appear infrequ medic literatur quantifi relationship term frequenc part speech tag compar journal public physician note standard medic concept phrase examin across ten medic dictionari list rule includ us census databas previous publish studi total featur use train decis tree classifiersresult model success recal phi token discharg summari cost sensit classif use weight recal precis f score f score half fals negat word appear hospit name patient name phone number home address least partial redact medic concept elev white blood cell count inform deidentif result exceed previous approv criteria establish four institut review boardsconclus result indic distribut differ privat public medic text can use accur classifi phi data algorithm report made freeli avail evalu improv

Resumos Similares

Appl Clin Inform - Rapid implementation of inpatient electronic physician documentation at an academic hospital. ( 0,809508533724338 )
J Am Med Inform Assoc - Meaningful measurement: developing a measurement system to improve blood pressure control in patients with chronic kidney disease. ( 0,789489286868389 )
Med Decis Making - Natural language processing improves identification of colorectal cancer testing in the electronic medical record. ( 0,781342860070825 )
J Biomed Inform - Predicting treatment process steps from events. ( 0,781242906651278 )
J Med Syst - Design and implementation of web-based discharge summary note based on service-oriented architecture. ( 0,756993006993007 )
J Am Med Inform Assoc - Development and evaluation of an ensemble resource linking medications to their indications. ( 0,754190615461809 )
Appl Clin Inform - Using a scripted data entry process to transfer legacy immunization data while transitioning between electronic medical record systems. ( 0,751214002100969 )
J Am Med Inform Assoc - PASTE: patient-centered SMS text tagging in a medication management system. ( 0,7491147353822 )
AMIA Annu Symp Proc - Continuity of Care Document (CCD) Enables Delivery of Medication Histories to the Primary Care Clinician. ( 0,741061922229753 )
AMIA Annu Symp Proc - Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans. ( 0,740126211635857 )
Comput. Biol. Med. - Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis. ( 0,733150624801072 )
Int J Med Inform - Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. ( 0,723444677885406 )
BMC Med Inform Decis Mak - The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. ( 0,720109212848233 )
J Am Med Inform Assoc - A framework for assessing patient crossover and health information exchange value. ( 0,71997911738348 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,71943055382443 )
J Biomed Inform - Using an ensemble system to improve concept extraction from clinical records. ( 0,718903666090747 )
J Biomed Inform - Extracting important information from Chinese Operation Notes with natural language processing methods. ( 0,709796145579816 )
Artif Intell Med - Statistical parsing of varieties of clinical Finnish. ( 0,703756179733722 )
J Med Syst - An approach to medical knowledge sharing in a hospital information system using MCLink. ( 0,700760328952595 )
AMIA Annu Symp Proc - Validation and enhancement of a computable medication indication resource (MEDI) using a large practice-based dataset. ( 0,700746980889005 )
Appl Clin Inform - An analysis of free-text alcohol use documentation in the electronic health record: early findings and implications. ( 0,699202936819591 )
AMIA Annu Symp Proc - A study of transportability of an existing smoking status detection module across institutions. ( 0,695112200633808 )
AMIA Annu Symp Proc - Analysis of medication and indication occurrences in clinical notes. ( 0,694860375824063 )
J Am Med Inform Assoc - Automated extraction of clinical traits of multiple sclerosis in electronic medical records. ( 0,692162827754462 )
J Biomed Inform - A classification of errors in lay comprehension of medical documents. ( 0,691209354682255 )
AMIA Annu Symp Proc - You can lead a horse to water: physicians' responses to clinical reminders. ( 0,685252779940818 )
J Am Med Inform Assoc - Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department. ( 0,684509126722126 )
BMC Med Inform Decis Mak - Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance. ( 0,680651127720487 )
J Am Med Inform Assoc - Data quality assessment in healthcare: a 365-day chart review of inpatients' health records at a Nigerian tertiary hospital. ( 0,680157512834183 )
J Am Med Inform Assoc - HARVEST, a longitudinal patient record summarizer. ( 0,673798942350582 )
BMC Med Inform Decis Mak - Determinants of frequency and longevity of hospital encounters' data use. ( 0,673538966402357 )
AMIA Annu Symp Proc - Modeling drug exposure data in electronic medical records: an application to warfarin. ( 0,673190163936878 )
AMIA Annu Symp Proc - Using language models to identify relevant new information in inpatient clinical notes. ( 0,672548342485637 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,671316653423459 )
Int J Med Inform - Separate may not be equal: a preliminary investigation of clinical correlates of electronic psychiatric record accessibility in academic medical centers. ( 0,66932404428763 )
J Am Med Inform Assoc - Correlating electronic health record concepts with healthcare process events. ( 0,666074806458888 )
J Am Med Inform Assoc - Data from clinical notes: a perspective on the tension between structure and flexible documentation. ( 0,66435074830375 )
AMIA Annu Symp Proc - EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. ( 0,660420850942977 )
AMIA Annu Symp Proc - De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. ( 0,659911986395398 )
J Am Med Inform Assoc - Clinical documentation: composition or synthesis? ( 0,658291947525119 )
BMC Med Inform Decis Mak - SciReader enables reading of medical content with instantaneous definitions. ( 0,65349455845228 )
J Am Med Inform Assoc - Evaluating the state of the art in coreference resolution for electronic medical records. ( 0,653467017986894 )
AMIA Annu Symp Proc - A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes. ( 0,648541542105113 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,646408154439056 )
J Am Med Inform Assoc - Self-reported fever and measured temperature in emergency department records used for syndromic surveillance. ( 0,644614921357556 )
Methods Inf Med - A generic method to monitor completeness and speed of medical documentation processes. ( 0,643918289018927 )
AMIA Annu Symp Proc - Potential value of health information exchange for people with epilepsy: crossover patterns and missing clinical data. ( 0,643496077439149 )
Int J Med Inform - The perception of medical professionals and medical students on the usefulness of an emergency medical card and a continuity of care report in enhancing continuity of care. ( 0,641392730574077 )
AMIA Annu Symp Proc - Lexical concept distribution reflects clinical practice. ( 0,64066896598395 )
AMIA Annu Symp Proc - A simple method to extract key maternal data from neonatal clinical notes. ( 0,639639629049425 )
Comput. Biol. Med. - Clinicians' evaluation of computer-assisted medication summarization of electronic medical records. ( 0,637204182290491 )
AMIA Annu Symp Proc - Capture of osteoporosis and fracture information in an electronic medical record database from primary care. ( 0,632543793248534 )
J Am Med Inform Assoc - Vaccine adverse event text mining system for extracting features from vaccine safety reports. ( 0,629872049222452 )
J Med Syst - A secure integrated medical information system. ( 0,629746189863154 )
AMIA Annu Symp Proc - Mining echocardiography workflows for disease discriminative patterns. ( 0,629719248273381 )
Appl Clin Inform - Development and validation of a computer-based algorithm to identify foreign-born patients with HIV infection from the electronic medical record. ( 0,627568790215173 )
AMIA Annu Symp Proc - Comparing content coverage in medical curriculum to trainee-authored clinical notes. ( 0,626652365263612 )
Comput Methods Programs Biomed - Improving the work efficiency of healthcare-associated infection surveillance using electronic medical records. ( 0,626436238564235 )
J Am Med Inform Assoc - Electronic medical record use in pediatric primary care. ( 0,625126340416273 )
Int J Med Inform - Implementation and expansion of an electronic medical record for HIV care and treatment in Haiti: an assessment of system use and the impact of large-scale disruptions. ( 0,624801587341643 )
Int J Med Inform - Structured electronic operative reporting: comparison with dictation in kidney cancer surgery. ( 0,623283186632424 )
AMIA Annu Symp Proc - The physical attractiveness of electronic physician notes. ( 0,622215021665621 )
AMIA Annu Symp Proc - Who said it? Establishing professional attribution among authors of Veterans' Electronic Health Records. ( 0,62136137895095 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,62119486439654 )
Appl Clin Inform - Clinical communication in diagnostic imaging studies: mixed-method study of pre- and post-implementation of a hospital information system. ( 0,620790830581351 )
AMIA Annu Symp Proc - Use of simulated physician handoffs to study cross-cover chart biopsy in the electronic medical record. ( 0,618940975362747 )
Int J Med Inform - The effects of an electronic medical record on the completeness of documentation in the anesthesia record. ( 0,6187837752866 )
J Med Syst - Evaluation of the medical records system in an upcoming teaching hospital-a project for improvisation. ( 0,618649648710016 )
J Am Med Inform Assoc - Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. ( 0,618537431300853 )
AMIA Annu Symp Proc - Learning to identify treatment relations in clinical text. ( 0,618446639664741 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,617605654987961 )
J Am Med Inform Assoc - Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. ( 0,617030576452907 )
Telemed J E Health - Information extraction for tracking liver cancer patients' statuses: from mixture of clinical narrative report types. ( 0,61626841448824 )
Comput Methods Programs Biomed - Comparison of documentation time between an electronic and a paper-based record system by optometrists at an eye hospital in south India: a time-motion study. ( 0,615653845415177 )
J Am Med Inform Assoc - Automating the medication regimen complexity index. ( 0,615045299435428 )
BMC Med Inform Decis Mak - Clinical decision support improves quality of telephone triage documentation--an analysis of triage documentation before and after computerized clinical decision support. ( 0,614903267196814 )
Appl Clin Inform - Determining primary care physician information needs to inform ambulatory visit note display. ( 0,613474467117799 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,612470862470862 )
Int J Med Inform - Concept and implementation of a computer-based reminder system to increase completeness in clinical documentation. ( 0,612399263326514 )
Health Informatics J - Tuberculosis-Diagnostic Expert System: an architecture for translating patients information from the web for use in tuberculosis diagnosis. ( 0,612178464905846 )
Int J Med Inform - The MITRE Identification Scrubber Toolkit: design, training, and assessment. ( 0,612159695594985 )
Appl Clin Inform - Analysis of free text with omaha system targets in community-based care to inform practice and terminology development. ( 0,612073990111132 )
J Am Med Inform Assoc - Presence of key findings in the medical record prior to a documented high-risk diagnosis. ( 0,611927226451689 )
AMIA Annu Symp Proc - An evaluation of a natural language processing tool for identifying and encoding allergy information in emergency department clinical notes. ( 0,609581114051009 )
Int J Health Geogr - Identifying risk factors for healthcare-associated infections from electronic medical record home address data. ( 0,60910077083359 )
Int J Med Inform - The peace of paper: patient lists as work tools. ( 0,609066963184249 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,607205789505333 )
Perspect Health Inf Manag - Lessons learned from implementation of voice recognition for documentation in the military electronic health record system. ( 0,605906911989957 )
J Am Med Inform Assoc - Point-of-care clinical documentation: assessment of a bladder cancer informatics tool (eCancerCareBladder): a randomized controlled study of efficacy, efficiency and user friendliness compared with standard electronic medical records. ( 0,605259636313084 )
Appl Clin Inform - What do physicians read (and ignore) in electronic progress notes? ( 0,604738870509045 )
Appl Clin Inform - Association of Medical Directors of Information Systems consensus on inpatient electronic health record documentation. ( 0,604019334498282 )
J Am Med Inform Assoc - Use of electronic medical records differs by specialty and office settings. ( 0,603814902646006 )
AMIA Annu Symp Proc - Location bias of identifiers in clinical narratives. ( 0,603040259490729 )
Appl Clin Inform - Perceived frequency and impact of missing information at pediatric emergency and general ambulatory encounters. ( 0,601101488695191 )
J Biomed Inform - Development of a clinician reputation metric to identify appropriate problem-medication pairs in a crowdsourced knowledge base. ( 0,600933076086409 )
J Am Med Inform Assoc - Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals. ( 0,60072870256258 )
Health Informatics J - Clinical Document Architecture integration system to support patient referral and reply letters. ( 0,598815949196072 )
Artif Intell Med - Automated interviews on clinical case reports to elicit directed acyclic graphs. ( 0,59832213719742 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,597892437509866 )
AMIA Annu Symp Proc - Prevalence and Clinical Significance of Discrepancies within Three Computerized Pre-Admission Medication Lists. ( 0,597244558498633 )