J Biomed Inform - Enhancing clinical concept extraction with distributional semantics.


{ extract(1171) text(1153) clinic(932) }
{ featur(3375) classif(2383) classifi(1994) }
{ model(3404) distribut(989) bayesian(671) }
{ learn(2355) train(1041) set(1003) }
{ treatment(1704) effect(941) patient(846) }
{ method(1557) propos(1049) approach(1037) }
{ take(945) account(800) differ(722) }
{ method(1219) similar(1157) match(930) }
{ use(976) code(926) identifi(902) }
{ perform(999) metric(946) measur(919) }
{ high(1669) rate(1365) level(1280) }
{ control(1307) perform(991) simul(935) }
{ case(1353) use(1143) diagnosi(1136) }
{ motion(1329) object(1292) video(1091) }
{ group(2977) signific(1463) compar(1072) }
{ decis(3086) make(1611) patient(1517) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ problem(2511) optim(1539) algorithm(950) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ can(774) often(719) complex(702) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ search(2224) databas(1162) retriev(909) }
{ research(1085) discuss(1038) issu(1018) }
{ can(981) present(881) function(850) }
{ structur(1116) can(940) graph(676) }
{ drug(1928) target(777) effect(648) }
{ imag(1947) propos(1133) code(1026) }
{ sequenc(1873) structur(1644) protein(1328) }
{ data(1714) softwar(1251) tool(1186) }
{ model(2220) cell(1177) simul(1124) }
{ system(1050) medic(1026) inform(1018) }
{ studi(1119) effect(1106) posit(819) }
{ ehr(2073) health(1662) electron(1139) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ method(2212) result(1239) propos(1039) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ design(1359) user(1324) use(1319) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ age(1611) year(1155) adult(843) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }


Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text. The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type "clinical trials" to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task. The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data.

Resumo Limpo

extract concept drug symptom diagnos clinic narrat constitut basic enabl technolog unlock knowledg within support advanc reason applic diagnosi explan diseas progress model intellig analysi effect treatment recent releas annot train set deidentifi clinic narrat contribut develop refin concept extract method howev annot process laborintens train data necessarili limit concept concept pattern cover impact perform supervis machin learn applic train data paper propos approach minim limit combin supervis machin learn empir learn semant related distribut relev word addit unannot text approach use sequenti discrimin classifi condit random field extract mention medic problem treatment test clinic narrat take advantag medlin abstract index public type clinic trial estim related word ibva train test corpora addit tradit featur dictionari match pattern match partofspeech tag also use featur word appear similar context word question word similar vector represent measur common use cosin metric vector represent deriv use method distribut semant best knowledg first effort explor use distribut semant semant deriv empir unannot text often use vector space model sequenc classif task concept extract therefor first experi differ slide window model found model paramet led best perform preliminari sequenc label task evalu approach perform ibva concept extract corpus show incorpor featur base distribut word across larg unannot corpus signific aid concept extract compar supervisedon approach baselin microaverag fscore exact match increas microaverag fscore base inexact match increas improv high signific accord bootstrap resampl method also consid perform system thus distribut semant featur signific improv perform concept extract clinic narrat take advantag word distribut inform obtain unannot data

Resumos Similares

AMIA Annu Symp Proc - Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing. ( 0,843009330609644 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,836124706478402 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,836118019378954 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,824652778819238 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,820432204168426 )
J Am Med Inform Assoc - Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. ( 0,820265508614053 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,814815286295346 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,8108404256842 )
J Biomed Inform - Anaphoric reference in clinical reports: characteristics of an annotated corpus. ( 0,809973401572912 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,807237769967422 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,80595427050719 )
J Am Med Inform Assoc - MedXN: an open source medication extraction and normalization tool for clinical text. ( 0,80521487062151 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,803640025310919 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,797695293066266 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,795770076983061 )
J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,794272042310716 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,794204692932115 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,793405793497478 )
J Am Med Inform Assoc - Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. ( 0,791903494600669 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,789062263855298 )
J Am Med Inform Assoc - Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries. ( 0,787891438971815 )
J Am Med Inform Assoc - Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. ( 0,787316892543979 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,786425677692847 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,785909094568229 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,785889189949986 )
J Biomed Inform - Extraction of events and temporal expressions from clinical narratives. ( 0,784183567121798 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,782690323400287 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,782406851342961 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,782392347940803 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,78149597998708 )
J Biomed Inform - Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. ( 0,780827774564533 )
J Integr Bioinform - Automatic extraction of microorganisms and their habitats from free text using text mining workflows. ( 0,780182400839262 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,780067525719112 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,775918785104782 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,775700078853832 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,773984810717268 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,773719246646976 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,773007498900131 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,772020360303059 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,771020670600942 )
J Biomed Inform - Identifying non-elliptical entity mentions in a coordinated NP with ellipses. ( 0,770661901007698 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,770272975341096 )
Int J Med Inform - Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs. ( 0,76971350358454 )
J Am Med Inform Assoc - A comprehensive study of named entity recognition in Chinese clinical text. ( 0,769479883799033 )
J Biomed Inform - Lessons learnt from the DDIExtraction-2013 Shared Task. ( 0,767195854047945 )
J Am Med Inform Assoc - Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. ( 0,765931507877851 )
AMIA Annu Symp Proc - Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. ( 0,764861927179641 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,760182758208726 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,759853480258007 )
AMIA Annu Symp Proc - Automatically pairing measured findings across narrative abdomen CT reports. ( 0,75924786696995 )
AMIA Annu Symp Proc - Discovering peripheral arterial disease cases from radiology notes using natural language processing. ( 0,758306810495048 )
J Am Med Inform Assoc - Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. ( 0,756804444993732 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,756759227861935 )
J Am Med Inform Assoc - A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. ( 0,755056495938257 )
AMIA Annu Symp Proc - Identifying discourse connectives in biomedical text. ( 0,754950814928069 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,75414206102736 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,753633639446727 )
AMIA Annu Symp Proc - Generalizability and comparison of automatic clinical text de-identification methods and resources. ( 0,749181693282125 )
J Am Med Inform Assoc - Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. ( 0,74807771562871 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,74761415973622 )
J Biomed Inform - Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. ( 0,747014429556632 )
AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,746801065422307 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,745677067028585 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,744323387197507 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,742276993537544 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,740958715860296 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,740428198299425 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,740342790947172 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,740232726944401 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,737395414613721 )
AMIA Annu Symp Proc - A cloud-based approach to medical NLP. ( 0,73568957786216 )
J Am Med Inform Assoc - Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. ( 0,735470270179053 )
Sci Data - Building the graph of medicine from millions of clinical narratives. ( 0,735322099474963 )
J Biomed Inform - Text summarization in the biomedical domain: a systematic review of recent research. ( 0,734334829413881 )
J Biomed Inform - Ontology-guided feature engineering for clinical text classification. ( 0,733195668325767 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,729030413720549 )
AMIA Annu Symp Proc - A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. ( 0,728682933986495 )
J Am Med Inform Assoc - A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. ( 0,728515873931618 )
J Am Med Inform Assoc - Vaccine adverse event text mining system for extracting features from vaccine safety reports. ( 0,727757081470438 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,72729146588559 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,726194440017418 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,725658073538195 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,725605085033048 )
J Biomed Inform - Degree centrality for semantic abstraction summarization of therapeutic studies. ( 0,725002193903322 )
AMIA Annu Symp Proc - Automated illustration of patients instructions. ( 0,724610422624591 )
AMIA Annu Symp Proc - Critical finding capture in the impression section of radiology reports. ( 0,723784722587957 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,723647469156993 )
J Biomed Inform - Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. ( 0,723136014125763 )
BMC Med Inform Decis Mak - Text summarization as a decision support aid. ( 0,722608714411883 )
J Biomed Inform - A natural language processing pipeline for pairing measurements uniquely across free-text CT reports. ( 0,72064194106513 )
J Am Med Inform Assoc - Automated concept-level information extraction to reduce the need for custom software and rules development. ( 0,71962443648565 )
AMIA Annu Symp Proc - It's about this and that: a description of anaphoric expressions in clinical text. ( 0,717692307692308 )
BMC Med Inform Decis Mak - Detecting causality from online psychiatric texts using inter-sentential language patterns. ( 0,715156378135285 )
J Biomed Inform - An enhanced CRFs-based system for information extraction from radiology reports. ( 0,713676780559446 )
J Biomed Inform - Dynamic categorization of clinical research eligibility criteria by hierarchical clustering. ( 0,710908154163216 )
J Biomed Inform - Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus. ( 0,710675663799793 )
AMIA Annu Symp Proc - Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing. ( 0,710455042197001 )
J Am Med Inform Assoc - A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. ( 0,709991675085103 )
J Am Med Inform Assoc - Pneumonia identification using statistical feature selection. ( 0,70676015939844 )
AMIA Annu Symp Proc - Towards a semantic lexicon for clinical natural language processing. ( 0,704946373265198 )