AMIA Annu Symp Proc - Part-of-speech tagging for clinical text: wall or bridge between institutions?

Tópicos

{ learn(2355) train(1041) set(1003) }
{ extract(1171) text(1153) clinic(932) }
{ concept(1167) ontolog(924) domain(897) }
{ detect(2391) sensit(1101) algorithm(908) }
{ studi(1410) differ(1259) use(1210) }
{ estim(2440) model(1874) function(577) }
{ algorithm(1844) comput(1787) effici(935) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ health(3367) inform(1360) care(1135) }
{ first(2504) two(1366) second(1323) }
{ analysi(2126) use(1163) compon(1037) }
{ chang(1828) time(1643) increas(1301) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ model(2656) set(1616) predict(1553) }
{ cost(1906) reduc(1198) effect(832) }
{ model(2341) predict(2261) use(1141) }
{ activ(1138) subject(705) human(624) }
{ data(1737) use(1416) pattern(1282) }
{ measur(2081) correl(1212) valu(896) }
{ sequenc(1873) structur(1644) protein(1328) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ clinic(1479) use(1117) guidelin(835) }
{ risk(3053) factor(974) diseas(938) }
{ perform(1367) use(1326) method(1137) }
{ model(3480) simul(1196) paramet(876) }
{ research(1218) medic(880) student(794) }
{ age(1611) year(1155) adult(843) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ drug(1928) target(777) effect(648) }
{ method(2212) result(1239) propos(1039) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ monitor(1329) mobil(1314) devic(1160) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }

Resumo

Part-of-speech (POS) tagging is a fundamental step required by various NLP systems. The training of a POS tagger relies on sufficient quality annotations. However, the annotation process is both knowledge-intensive and time-consuming in the clinical domain. A promising solution appears to be for institutions to share their annotation efforts, and yet there is little research on associated issues. We performed experiments to understand how POS tagging performance would be affected by using a pre-trained tagger versus raw training data across different institutions. We manually annotated a set of clinical notes at Kaiser Permanente Southern California (KPSC) and a set from the University of Pittsburg Medical Center (UPMC), and trained/tested POS taggers with intra- and inter-institution settings. The cTAKES POS tagger was also included in the comparison to represent a tagger partially trained from the notes of a third institution, Mayo Clinic at Rochester. Intra-institution 5-fold cross-validation estimated an accuracy of 0.953 and 0.945 on the KPSC and UPMC notes respectively. Trained purely on KPSC notes, the accuracy was 0.897 when tested on UPMC notes. Trained purely on UPMC notes, the accuracy was 0.904 when tested on KPSC notes. Applying the cTAKES tagger pre-trained with Mayo Clinic's notes, the accuracy was 0.881 on KPSC notes and 0.883 on UPMC notes. After adding UPMC annotations to KPSC training data, the average accuracy on tested KPSC notes increased to 0.965. After adding KPSC annotations to UPMC training data, the average accuracy on tested UPMC notes increased to 0.953. The results indicated: first, the performance of pre-trained POS taggers dropped about 5% when applied directly across the institutions; second, mixing annotations from another institution following the same guideline increased tagging accuracy for about 1%. Our findings suggest that institutions can benefit more from sharing raw annotations but less from sharing pre-trained models for the POS tagging task. We believe the study could also provide general insights on cross-institution data sharing for other types of NLP tasks.

Resumo Limpo

partofspeech pos tag fundament step requir various nlp system train pos tagger reli suffici qualiti annot howev annot process knowledgeintens timeconsum clinic domain promis solut appear institut share annot effort yet littl research associ issu perform experi understand pos tag perform affect use pretrain tagger versus raw train data across differ institut manual annot set clinic note kaiser permanent southern california kpsc set univers pittsburg medic center upmc trainedtest pos tagger intra interinstitut set ctake pos tagger also includ comparison repres tagger partial train note third institut mayo clinic rochest intrainstitut fold crossvalid estim accuraci kpsc upmc note respect train pure kpsc note accuraci test upmc note train pure upmc note accuraci test kpsc note appli ctake tagger pretrain mayo clinic note accuraci kpsc note upmc note ad upmc annot kpsc train data averag accuraci test kpsc note increas ad kpsc annot upmc train data averag accuraci test upmc note increas result indic first perform pretrain pos tagger drop appli direct across institut second mix annot anoth institut follow guidelin increas tag accuraci find suggest institut can benefit share raw annot less share pretrain model pos tag task believ studi also provid general insight crossinstitut data share type nlp task

Resumos Similares

J Am Med Inform Assoc - Machine learning-based coreference resolution of concepts in clinical documents. ( 0,761576496686292 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,760459982319239 )
J Am Med Inform Assoc - Using machine learning for concept extraction on clinical documents from multiple data sources. ( 0,742329211275983 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,734242231277361 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,732787141151457 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,72786501560407 )
AMIA Annu Symp Proc - Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics. ( 0,718137983371015 )
AMIA Annu Symp Proc - Hyperdimensional computing approach to word sense disambiguation. ( 0,71249505298541 )
J Biomed Inform - Temporal relation discovery between events and temporal expressions identified in clinical narrative. ( 0,712212025614216 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,700924566076423 )
Artif Intell Med - A system for the extraction and representation of summary of product characteristics content. ( 0,700237949841706 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,697067987385547 )
AMIA Annu Symp Proc - Automatically Detecting Acute Myocardial Infarction Events from EHR Text: A Preliminary Study. ( 0,697064178520624 )
Artif Intell Med - Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective. ( 0,694446438050849 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,69017744797059 )
Lifetime Data Anal - Regression analysis of multivariate recurrent event data with a dependent terminal event. ( 0,689167385736029 )
AMIA Annu Symp Proc - Automated identification of medical concepts and assertions in medical text. ( 0,680103100659296 )
Comput Methods Programs Biomed - Multistage approach for clustering and classification of ECG data. ( 0,678980728119729 )
BMC Med Inform Decis Mak - Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. ( 0,675312442566322 )
J Am Med Inform Assoc - Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. ( 0,675050807191342 )
IEEE Trans Image Process - Fast semantic diffusion for large-scale context-based image and video annotation. ( 0,672939833623345 )
J Biomed Inform - Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research. ( 0,669911151224936 )
Artif Intell Med - Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children. ( 0,66968795919695 )
J Am Med Inform Assoc - Automated concept-level information extraction to reduce the need for custom software and rules development. ( 0,66891654418076 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,663933700691771 )
J Biomed Inform - Detecting hedge cues and their scope in biomedical text with conditional random fields. ( 0,660938242989914 )
J Biomed Inform - Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus. ( 0,660491628179919 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,660270515119059 )
AMIA Annu Symp Proc - Parenthetically speaking: classifying the contents of parentheses for text mining. ( 0,653329087536542 )
J Biomed Inform - Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. ( 0,651132498297669 )
AMIA Annu Symp Proc - Classification of medication status change in clinical narratives. ( 0,650077047755297 )
AMIA Annu Symp Proc - An evaluation of the UMLS in representing corpus derived clinical concepts. ( 0,649522155219221 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,648053111467466 )
J Biomed Inform - Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. ( 0,64350352836091 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,641821475018033 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,640851693104615 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,638480852172108 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,635384272295792 )
Neural Comput - Representing objects, relations, and sequences. ( 0,634036138009886 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,633637314763078 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,627764910444026 )
J Am Med Inform Assoc - A rule based solution to co-reference resolution in clinical text. ( 0,627092209656655 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,626443954326785 )
J Am Med Inform Assoc - Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. ( 0,625295120894022 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,624982165959864 )
J Biomed Inform - Desiderata for ontologies to be used in semantic annotation of biomedical documents. ( 0,622672695798227 )
J Biomed Inform - Approaches to verb subcategorization for biomedicine. ( 0,622082432031436 )
J Am Med Inform Assoc - Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks. ( 0,621157029512293 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,619875188251048 )
AMIA Annu Symp Proc - Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature. ( 0,616947981903327 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,615361980697137 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,614541568866459 )
J Am Med Inform Assoc - The BioIntelligence Framework: a new computational platform for biomedical knowledge computing. ( 0,614356290210961 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,613733813303862 )
IEEE Trans Image Process - A Probabilistic Associative Model for Segmenting Weakly-Supervised Images. ( 0,613418307239993 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,612782568113885 )
Artif Intell Med - Terminological resources for text mining over biomedical scientific literature. ( 0,612369768964987 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,611992452226228 )
BMC Med Inform Decis Mak - Mining biomarker information in biomedical literature. ( 0,611489081180522 )
AMIA Annu Symp Proc - Towards a semantic lexicon for clinical natural language processing. ( 0,610722650340788 )
J Am Med Inform Assoc - Automated identification of drug and food allergies entered using non-standard terminology. ( 0,608440650045443 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,608271010592484 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,604398536843022 )
AMIA Annu Symp Proc - A cloud-based approach to medical NLP. ( 0,603077186789855 )
J Biomed Inform - Lessons learnt from the DDIExtraction-2013 Shared Task. ( 0,60222510031132 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,601823557622472 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,60161309885965 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,601059782987372 )
J Biomed Inform - Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. ( 0,600225418477881 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,599804246517677 )
J Biomed Inform - Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing. ( 0,59864961457252 )
J Biomed Inform - Knowledge based word-concept model estimation and refinement for biomedical text mining. ( 0,597095633853734 )
J Am Med Inform Assoc - Active learning for clinical text classification: is it better than random sampling? ( 0,5960370110659 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,595001333658348 )
AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,594344367253272 )
AMIA Annu Symp Proc - Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives. ( 0,593603173604906 )
Int J Med Inform - De-identification of clinical narratives through writing complexity measures. ( 0,592890751176882 )
AMIA Annu Symp Proc - Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes. ( 0,592749563883821 )
J Biomed Inform - Anaphoric reference in clinical reports: characteristics of an annotated corpus. ( 0,591874909002614 )
AMIA Annu Symp Proc - Active Learning-based corpus annotation--the PathoJen experience. ( 0,590775113960785 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,590255514549911 )
Neural Comput - Mismatched training and test distributions can outperform matched ones. ( 0,590214488662312 )
AMIA Annu Symp Proc - Using ontology network structure in text mining. ( 0,589902446830799 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,589229074567969 )
J Am Med Inform Assoc - Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. ( 0,589208845123952 )
Artif Intell Med - Figure classification in biomedical literature to elucidate disease mechanisms, based on pathways. ( 0,588852068417215 )
Neural Comput - Computing sparse representations of multidimensional signals using Kronecker bases. ( 0,588123433795483 )
IEEE Trans Image Process - Multiview Hessian regularization for image annotation. ( 0,587485597398449 )
J Biomed Inform - Ontology modularization to improve semantic medical image annotation. ( 0,587386814721685 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,587097497176972 )
J Am Med Inform Assoc - A comprehensive study of named entity recognition in Chinese clinical text. ( 0,586310052129473 )
J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,584887344813663 )
Neural Comput - Metacognitive learning in a fully complex-valued radial basis function neural network. ( 0,584307086450254 )
J Biomed Inform - Reducing systematic review workload through certainty-based screening. ( 0,584169321614635 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,583449459809028 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,583039325345156 )
Int J Neural Syst - Span: spike pattern association neuron for learning spatio-temporal spike patterns. ( 0,582113849266899 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,581948840048324 )
J Biomed Inform - An enhanced CRFs-based system for information extraction from radiology reports. ( 0,580555313191906 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,579602567161056 )