Int J Med Inform - De-identification of clinical narratives through writing complexity measures.

Tópicos

{ extract(1171) text(1153) clinic(932) }
{ learn(2355) train(1041) set(1003) }
{ group(2977) signific(1463) compar(1072) }
{ method(1969) cluster(1462) data(1082) }
{ patient(2837) hospit(1953) medic(668) }
{ method(1219) similar(1157) match(930) }
{ sampl(1606) size(1419) use(1276) }
{ model(3480) simul(1196) paramet(876) }
{ featur(3375) classif(2383) classifi(1994) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ assess(1506) score(1403) qualiti(1306) }
{ system(1050) medic(1026) inform(1018) }
{ data(1737) use(1416) pattern(1282) }
{ record(1888) medic(1808) patient(1693) }
{ measur(2081) correl(1212) valu(896) }
{ studi(2440) review(1878) systemat(933) }
{ search(2224) databas(1162) retriev(909) }
{ visual(1396) interact(850) tool(830) }
{ health(3367) inform(1360) care(1135) }
{ data(2317) use(1299) case(1017) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ can(774) often(719) complex(702) }
{ inform(2794) health(2639) internet(1427) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ general(901) number(790) one(736) }
{ perform(999) metric(946) measur(919) }
{ import(1318) role(1303) understand(862) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ model(2656) set(1616) predict(1553) }
{ signal(2180) analysi(812) frequenc(800) }
{ can(981) present(881) function(850) }
{ high(1669) rate(1365) level(1280) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ monitor(1329) mobil(1314) devic(1160) }
{ state(1844) use(1261) util(961) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

RPOSE: Electronic health records contain a substantial quantity of clinical narrative, which is increasingly reused for research purposes. To share data on a large scale and respect privacy, it is critical to remove patient identifiers. De-identification tools based on machine learning have been proposed; however, model training is usually based on either a random group of documents or a pre-existing document type designation (e.g., discharge summary). This work investigates if inherent features, such as the writing complexity, can identify document subsets to enhance de-identification performance.METHODS: We applied an unsupervised clustering method to group two corpora based on writing complexity measures: a collection of over 4500 documents of varying document types (e.g., discharge summaries, history and physical reports, and radiology reports) from Vanderbilt University Medical Center (VUMC) and the publicly available i2b2 corpus of 889 discharge summaries. We compare the performance (via recall, precision, and F-measure) of de-identification models trained on such clusters with models trained on documents grouped randomly or VUMC document type.RESULTS: For the Vanderbilt dataset, it was observed that training and testing de-identification models on the same stylometric cluster (with the average F-measure of 0.917) tended to outperform models based on clusters of random documents (with an average F-measure of 0.881). It was further observed that increasing the size of a training subset sampled from a specific cluster could yield improved results (e.g., for subsets from a certain stylometric cluster, the F-measure raised from 0.743 to 0.841 when training size increased from 10 to 50 documents, and the F-measure reached 0.901 when the size of the training subset reached 200 documents). For the i2b2 dataset, training and testing on the same clusters based on complexity measures (average F-score 0.966) did not significantly surpass randomly selected clusters (average F-score 0.965).CONCLUSIONS: Our findings illustrate that, in environments consisting of a variety of clinical documentation, de-identification models trained on writing complexity measures are better than models trained on random groups and, in many instances, document types.

Resumo Limpo

rpose electron health record contain substanti quantiti clinic narrat increas reus research purpos share data larg scale respect privaci critic remov patient identifi deidentif tool base machin learn propos howev model train usual base either random group document preexist document type design eg discharg summari work investig inher featur write complex can identifi document subset enhanc deidentif performancemethod appli unsupervis cluster method group two corpora base write complex measur collect document vari document type eg discharg summari histori physic report radiolog report vanderbilt univers medic center vumc public avail ib corpus discharg summari compar perform via recal precis fmeasur deidentif model train cluster model train document group random vumc document typeresult vanderbilt dataset observ train test deidentif model stylometr cluster averag fmeasur tend outperform model base cluster random document averag fmeasur observ increas size train subset sampl specif cluster yield improv result eg subset certain stylometr cluster fmeasur rais train size increas document fmeasur reach size train subset reach document ib dataset train test cluster base complex measur averag fscore signific surpass random select cluster averag fscore conclus find illustr environ consist varieti clinic document deidentif model train write complex measur better model train random group mani instanc document type

Resumos Similares

J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,721127650895737 )
J Am Med Inform Assoc - A comprehensive study of named entity recognition in Chinese clinical text. ( 0,707960608737753 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,694018090220933 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,686483025946302 )
J Am Med Inform Assoc - Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. ( 0,683646801207676 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,679803222569022 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,677283061639675 )
J Am Med Inform Assoc - Induced lexico-syntactic patterns improve information extraction from online medical forums. ( 0,675839236170175 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,658476485818011 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,654938054055539 )
AMIA Annu Symp Proc - Risk stratification of ICU patients using topic models inferred from unstructured progress notes. ( 0,646549506719098 )
J Biomed Inform - Enhancing clinical concept extraction with distributional semantics. ( 0,64412191882387 )
J Biomed Inform - Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus. ( 0,640557361212346 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,639593835343787 )
J Am Med Inform Assoc - A supervised framework for resolving coreference in clinical records. ( 0,63557969716384 )
J Am Med Inform Assoc - Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. ( 0,63292967160666 )
J Biomed Inform - A method for determining the number of documents needed for a gold standard corpus. ( 0,632182531067012 )
J Am Med Inform Assoc - Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. ( 0,631273035749182 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,629776695581999 )
Med Decis Making - Automatically annotating topics in transcripts of patient-provider interactions via machine learning. ( 0,626642400861634 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,624492021395224 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,624040260534595 )
J Am Med Inform Assoc - Automated concept-level information extraction to reduce the need for custom software and rules development. ( 0,622593688843452 )
J Am Med Inform Assoc - A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. ( 0,621634129321892 )
BMC Med Inform Decis Mak - Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. ( 0,619510320908442 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,618348537629114 )
Appl Clin Inform - Comparing the effectiveness of computerized adverse drug event monitoring systems to enhance clinical decision support for hospitalized patients. ( 0,615241669475158 )
J Am Med Inform Assoc - Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. ( 0,614310606216593 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,612487628774212 )
AMIA Annu Symp Proc - Hyperdimensional computing approach to word sense disambiguation. ( 0,611589065181883 )
J Am Med Inform Assoc - Pneumonia identification using statistical feature selection. ( 0,611384460525746 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,610534648922717 )
AMIA Annu Symp Proc - Extracting temporal information from electronic patient records. ( 0,609324785086646 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,609187501110767 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,608568481933762 )
AMIA Annu Symp Proc - Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing. ( 0,605946894467696 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,604096129477551 )
J Biomed Inform - Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. ( 0,602479609907056 )
AMIA Annu Symp Proc - Active Learning-based corpus annotation--the PathoJen experience. ( 0,60167683533338 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,601166354069131 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,600830204995437 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,59855527729773 )
J Am Med Inform Assoc - Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. ( 0,598481971002181 )
J Biomed Inform - Extracting important information from Chinese Operation Notes with natural language processing methods. ( 0,596500108161961 )
J Am Med Inform Assoc - Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks. ( 0,595321781786696 )
AMIA Annu Symp Proc - Part-of-speech tagging for clinical text: wall or bridge between institutions? ( 0,592890751176882 )
AMIA Annu Symp Proc - Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. ( 0,591665451467714 )
J Am Med Inform Assoc - Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements. ( 0,590947762012061 )
J Biomed Inform - Dynamic categorization of clinical research eligibility criteria by hierarchical clustering. ( 0,589189676078003 )
J Biomed Inform - Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing. ( 0,589145177991155 )
Comput Methods Programs Biomed - Multistage approach for clustering and classification of ECG data. ( 0,588411403115613 )
J Biomed Inform - Temporal relation discovery between events and temporal expressions identified in clinical narrative. ( 0,588352406980533 )
AMIA Annu Symp Proc - Automatically Detecting Acute Myocardial Infarction Events from EHR Text: A Preliminary Study. ( 0,588018404728147 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,587453009542774 )
J Am Med Inform Assoc - Using machine learning for concept extraction on clinical documents from multiple data sources. ( 0,586873073772185 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,586480433523686 )
AMIA Annu Symp Proc - Combining Structured and Free-text Data for Automatic Coding of Patient Outcomes. ( 0,586252673530259 )
BMC Med Inform Decis Mak - Predicting sample size required for classification performance. ( 0,585808283009648 )
IEEE J Biomed Health Inform - Identifying Similar Cases in Document Networks using Cross-reference Structures. ( 0,585376958956971 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,584869036375423 )
J Am Med Inform Assoc - Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules. ( 0,584814784304623 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,584625756457986 )
J Am Med Inform Assoc - Evaluation of a pictograph enhancement system for patient instruction: a recall study. ( 0,582972536453936 )
J Biomed Inform - Knowledge based word-concept model estimation and refinement for biomedical text mining. ( 0,582637054725509 )
J Biomed Inform - Detecting hedge cues and their scope in biomedical text with conditional random fields. ( 0,58215476085217 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,582068828151425 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,581590357012424 )
AMIA Annu Symp Proc - Using ontology network structure in text mining. ( 0,581370252616298 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,58128814156295 )
AMIA Annu Symp Proc - Automated identification of medical concepts and assertions in medical text. ( 0,580382387771781 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,579412838595348 )
J Biomed Inform - Degree centrality for semantic abstraction summarization of therapeutic studies. ( 0,579175071008562 )
AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,576899285485014 )
J Am Med Inform Assoc - Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. ( 0,576890373049011 )
Artif Intell Med - Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children. ( 0,576793125607163 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,576324850605777 )
J Am Med Inform Assoc - Named entity recognition of follow-up and time information in 20,000 radiology reports. ( 0,575893692283171 )
J Biomed Inform - Predicting treatment process steps from events. ( 0,575228871233269 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,574736901826122 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,573329155410915 )
AMIA Annu Symp Proc - On-time clinical phenotype prediction based on narrative reports. ( 0,572671933363372 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,571733044956371 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,571196618367358 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,571111298739118 )
AMIA Annu Symp Proc - Parenthetically speaking: classifying the contents of parentheses for text mining. ( 0,570887303482986 )
AMIA Annu Symp Proc - Using UMLS lexical resources to disambiguate abbreviations in clinical text. ( 0,570356213872953 )
BMC Med Inform Decis Mak - Text summarization as a decision support aid. ( 0,570313773365345 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,569742057376867 )
J Am Med Inform Assoc - A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. ( 0,569556719598582 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,569176722192662 )
Comput. Biol. Med. - Parsing citations in biomedical articles using conditional random fields. ( 0,567865862563194 )
AMIA Annu Symp Proc - Document clustering of clinical narratives: a systematic study of clinical sublanguages. ( 0,566936340530759 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,565375840315264 )
Methods Inf Med - A proof of concept for assessing emergency room use with primary care data and natural language processing. ( 0,564379500429559 )
J Biomed Inform - Ontology-guided feature engineering for clinical text classification. ( 0,564037251038314 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,563032343967461 )
J Biomed Inform - Text summarization in the biomedical domain: a systematic review of recent research. ( 0,562853033736925 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,562066401338027 )
IEEE Trans Image Process - A Probabilistic Associative Model for Segmenting Weakly-Supervised Images. ( 0,561785034893293 )
Appl Clin Inform - Clinical communication in diagnostic imaging studies: mixed-method study of pre- and post-implementation of a hospital information system. ( 0,561282212840535 )