J Biomed Inform - De-identification of clinical notes in French: towards a protocol for reference corpus development.

Tópicos

{ extract(1171) text(1153) clinic(932) }
{ design(1359) user(1324) use(1319) }
{ research(1218) medic(880) student(794) }
{ clinic(1479) use(1117) guidelin(835) }
{ use(1733) differ(960) four(931) }
{ sequenc(1873) structur(1644) protein(1328) }
{ perform(1367) use(1326) method(1137) }
{ group(2977) signific(1463) compar(1072) }
{ method(2212) result(1239) propos(1039) }
{ search(2224) databas(1162) retriev(909) }
{ time(1939) patient(1703) rate(768) }
{ treatment(1704) effect(941) patient(846) }
{ learn(2355) train(1041) set(1003) }
{ record(1888) medic(1808) patient(1693) }
{ sampl(1606) size(1419) use(1276) }
{ first(2504) two(1366) second(1323) }
{ estim(2440) model(1874) function(577) }
{ measur(2081) correl(1212) valu(896) }
{ framework(1458) process(801) describ(734) }
{ general(901) number(790) one(736) }
{ use(976) code(926) identifi(902) }
{ can(774) often(719) complex(702) }
{ imag(2830) propos(1344) filter(1198) }
{ studi(2440) review(1878) systemat(933) }
{ chang(1828) time(1643) increas(1301) }
{ data(1714) softwar(1251) tool(1186) }
{ import(1318) role(1303) understand(862) }
{ studi(1119) effect(1106) posit(819) }
{ patient(2837) hospit(1953) medic(668) }
{ use(2086) technolog(871) perceiv(783) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

CKGROUND: To facilitate research applying Natural Language Processing to clinical documents, tools and resources are needed for the automatic de-identification of Electronic Health Records.OBJECTIVE: This study investigates methods for developing a high-quality reference corpus for the de-identification of clinical documents in French.METHODS: A corpus comprising a variety of clinical document types covering several medical specialties was pre-processed with two automatic de-identification systems from the MEDINA suite of tools: a rule-based system and a system using Conditional Random Fields (CRF). The pre-annotated documents were revised by two human annotators trained to mark ten categories of Protected Health Information (PHI). The human annotators worked independently and were blind to the system that produced the pre-annotations they were revising.The best pre-annotation system was applied to another random selection of 100 documents.After revision by one annotator, this set was used to train a statistical de-identification system.RESULTS: Two gold standard sets of 100 documents were created based on the consensus of two human revisions of the automatic pre-annotations.The annotation experiment showed that (i) automatic pre-annotation obtained with the rule-based system performed better (F=0.813) than the CRF system (F=0.519), (ii) the human annotators spent more time revising the pre-annotations obtained with the rule-based system (from 102 to 160minutes for 50 documents), compared to the CRF system (from 93 to 142minutes for 50 documents), (iii) the quality of human annotation is higher when pre-annotations are obtained with the rule-based system (F-measure ranging from 0.970 to 0.987), compared to the CRF system (F-measure ranging from 0.914 to 0.981).Finally, only 20 documents from the training set were needed for the statistical system to outperform the pre-annotation systems that were trained on corpora from a medical speciality and hospital different from those in the reference corpus developed herein.CONCLUSION: We find that better pre-annotations increase the quality of the reference corpus but require more revision time. A statistical de-identification method outperforms our rule-based system when as little as 20 custom training documents are available.

Resumo Limpo

ckground facilit research appli natur languag process clinic document tool resourc need automat deidentif electron health recordsobject studi investig method develop highqual refer corpus deidentif clinic document frenchmethod corpus compris varieti clinic document type cover sever medic specialti preprocess two automat deidentif system medina suit tool rulebas system system use condit random field crf preannot document revis two human annot train mark ten categori protect health inform phi human annot work independ blind system produc preannot revisingth best preannot system appli anoth random select documentsaft revis one annot set use train statist deidentif systemresult two gold standard set document creat base consensus two human revis automat preannotationsth annot experi show automat preannot obtain rulebas system perform better f crf system f ii human annot spent time revis preannot obtain rulebas system minut document compar crf system minut document iii qualiti human annot higher preannot obtain rulebas system fmeasur rang compar crf system fmeasur rang final document train set need statist system outperform preannot system train corpora medic special hospit differ refer corpus develop hereinconclus find better preannot increas qualiti refer corpus requir revis time statist deidentif method outperform rulebas system littl custom train document avail

Resumos Similares

BMC Med Inform Decis Mak - Quality of human-computer interaction--results of a national usability survey of hospital-IT in Germany. ( 0,725876047082116 )
Comput Methods Programs Biomed - Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects. ( 0,72529194564826 )
J Med Syst - A mobile Nursing Information System based on human-computer interaction design for improving quality of nursing. ( 0,722868440293768 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,713589774165641 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,693139569145763 )
AMIA Annu Symp Proc - Parenthetically speaking: classifying the contents of parentheses for text mining. ( 0,692834742214895 )
J Am Med Inform Assoc - A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. ( 0,687506410504487 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,67914700644456 )
AMIA Annu Symp Proc - The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. ( 0,674888254064446 )
J Am Med Inform Assoc - A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. ( 0,669152909088827 )
J Am Med Inform Assoc - Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. ( 0,666463910220225 )
J Biomed Inform - Approaches to verb subcategorization for biomedicine. ( 0,662672381882653 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,659813986439913 )
J Biomed Inform - Using an ensemble system to improve concept extraction from clinical records. ( 0,65857731920915 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,65588312023939 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,655092460043888 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,654965701475354 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,652556448253062 )
BMC Med Inform Decis Mak - Text summarization as a decision support aid. ( 0,644565028469377 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,640646724116504 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,640249192377531 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,640124778300005 )
AMIA Annu Symp Proc - Sophia: A Expedient UMLS Concept Extraction Annotator. ( 0,639641098600871 )
Methods Inf Med - Authentication systems for securing clinical documentation workflows. A systematic literature review. ( 0,639180542331056 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,63859599767054 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,638340699371468 )
J Am Med Inform Assoc - Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. ( 0,637963794552027 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,637091245285088 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,63598240695697 )
Comput. Biol. Med. - Parsing citations in biomedical articles using conditional random fields. ( 0,635955917035553 )
J Am Med Inform Assoc - Pneumonia identification using statistical feature selection. ( 0,632495871066917 )
AMIA Annu Symp Proc - Automated illustration of patients instructions. ( 0,631649896427405 )
J Biomed Inform - The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. ( 0,631117398075691 )
J Biomed Inform - Text de-identification for privacy protection: a study of its impact on clinical text information content. ( 0,63080322949059 )
J Biomed Inform - Detecting hedge cues and their scope in biomedical text with conditional random fields. ( 0,630226556391899 )
Comput. Biol. Med. - An efficient word typing P300-BCI system using a modified T9 interface and random forest classifier. ( 0,630099904488746 )
AMIA Annu Symp Proc - Automatically pairing measured findings across narrative abdomen CT reports. ( 0,629821294041641 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,629320274759598 )
J Biomed Inform - Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. ( 0,625980007336619 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,625347784338692 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,625194297363044 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,622494726141162 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,622200549173805 )
J Am Med Inform Assoc - A knowledge discovery and reuse pipeline for information extraction in clinical notes. ( 0,618800157528448 )
Telemed J E Health - User-friendly cognitive training for the elderly: a technical report. ( 0,618783480234472 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,617228835423744 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,616443504488122 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,613340663862514 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,612676052110058 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,61133419561603 )
J Am Med Inform Assoc - MedXN: an open source medication extraction and normalization tool for clinical text. ( 0,610539077562105 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,609598984439166 )
J Biomed Inform - Relation mining experiments in the pharmacogenomics domain. ( 0,609349722962868 )
AMIA Annu Symp Proc - Automated non-alphanumeric symbol resolution in clinical texts. ( 0,608600953330769 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,60849987421821 )
J Am Med Inform Assoc - A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. ( 0,608216464408028 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,60788865398768 )
J Biomed Inform - Identifying non-elliptical entity mentions in a coordinated NP with ellipses. ( 0,607820997410738 )
AMIA Annu Symp Proc - Using UMLS lexical resources to disambiguate abbreviations in clinical text. ( 0,6072551012717 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,606348324395967 )
J Biomed Inform - Annotating temporal information in clinical narratives. ( 0,606007665233876 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,605603843663016 )
J Biomed Inform - Desiderata for ontologies to be used in semantic annotation of biomedical documents. ( 0,605269704335371 )
AMIA Annu Symp Proc - Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing. ( 0,605156385032194 )
AMIA Annu Symp Proc - Critical finding capture in the impression section of radiology reports. ( 0,605132213740316 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,603637518156072 )
AMIA Annu Symp Proc - EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. ( 0,603340008087025 )
BMC Med Inform Decis Mak - The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. ( 0,602827029801233 )
J Am Med Inform Assoc - Vaccine adverse event text mining system for extracting features from vaccine safety reports. ( 0,601980462030281 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,601498030778697 )
J Am Med Inform Assoc - Induced lexico-syntactic patterns improve information extraction from online medical forums. ( 0,600599264939617 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,600062283911616 )
IEEE Trans Vis Comput Graph - GraphDiaries: Animated Transitions and Temporal Navigation for Dynamic Networks. ( 0,596857938026887 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,595791315581436 )
Health Info Libr J - The role of readability in effective health communication: an experiment using a Japanese health information text on chronic suppurative otitis media. ( 0,595353901764894 )
J Am Med Inform Assoc - A comprehensive study of named entity recognition in Chinese clinical text. ( 0,594837633274357 )
J Biomed Inform - Conceptualization and application of an approach for designing healthcare software interfaces. ( 0,593779486058818 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,593530707488474 )
J Am Med Inform Assoc - Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. ( 0,593276217354618 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,590604078545002 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,59046055247432 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,589595900647201 )
AMIA Annu Symp Proc - A qualitative analysis of EHR clinical document synthesis by clinicians. ( 0,588392160176477 )
BMC Med Inform Decis Mak - Improving access to clinical practice guidelines with an interactive graphical interface using an iconic language. ( 0,587695450646786 )
J Integr Bioinform - Life sciences data and application integration with B-fabric. ( 0,587509704481518 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,586710279431698 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,586672952481975 )
J. Med. Internet Res. - Developing a disease outbreak event corpus. ( 0,586424094606617 )
Sci Data - Building the graph of medicine from millions of clinical narratives. ( 0,58630344637316 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,585411983970289 )
J Am Med Inform Assoc - Exploiting domain information for Word Sense Disambiguation of medical documents. ( 0,583885439946153 )
J. Med. Internet Res. - Mobile applications for diabetics: a systematic review and expert-based usability evaluation considering the special requirements of diabetes patients age 50 years or older. ( 0,583863463841127 )
J Biomed Inform - Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study. ( 0,582327714061922 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,581193459252272 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,580869036366062 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,577559109670439 )
AMIA Annu Symp Proc - It's about this and that: a description of anaphoric expressions in clinical text. ( 0,577522074064486 )
AMIA Annu Symp Proc - A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. ( 0,577358889381323 )
AMIA Annu Symp Proc - ASLForm: an adaptive self learning medical form generating system. ( 0,577332328684056 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,576674825942041 )