AMIA Annu Symp Proc - Active Learning-based corpus annotation--the PathoJen experience.

Tópicos

{ extract(1171) text(1153) clinic(932) }
{ clinic(1479) use(1117) guidelin(835) }
{ general(901) number(790) one(736) }
{ learn(2355) train(1041) set(1003) }
{ case(1353) use(1143) diagnosi(1136) }
{ compound(1573) activ(1297) structur(1058) }
{ data(2317) use(1299) case(1017) }
{ sampl(1606) size(1419) use(1276) }
{ cost(1906) reduc(1198) effect(832) }
{ implement(1333) system(1263) develop(1122) }
{ imag(1947) propos(1133) code(1026) }
{ measur(2081) correl(1212) valu(896) }
{ chang(1828) time(1643) increas(1301) }
{ search(2224) databas(1162) retriev(909) }
{ system(1050) medic(1026) inform(1018) }
{ first(2504) two(1366) second(1323) }
{ decis(3086) make(1611) patient(1517) }
{ method(2212) result(1239) propos(1039) }
{ method(1219) similar(1157) match(930) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ concept(1167) ontolog(924) domain(897) }
{ data(3963) clinic(1234) research(1004) }
{ research(1085) discuss(1038) issu(1018) }
{ model(3480) simul(1196) paramet(876) }
{ state(1844) use(1261) util(961) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

We report on basic design decisions and novel annotation procedures underlying the development of PathoJen, a corpus of Medline abstracts annotated for pathological phenomena, including diseases as a proper subclass. This named entity type is known to be hard to delineate and capture by annotation guidelines. We here propose a two-category encoding schema where we distinguish short from long mention spans, the first covering standardized terminology (e.g. diseases), the latter accounting for less structured descriptive statements about norm-deviant states, as well as criteria and observations that might signal pathologies. The second design decision relates to the way annotation instances are sampled. Here we subscribe to an Active Learning-based approach which is known to save annotation costs without sacrificing annotation quality by means of a sample bias. By design, Active Learning picks up 'hard' to annotate instances for human annotators, whereas 'easier' ones are passed over to the automatic classifier whose models already incorporate and gradually improve with previous annotation experience.

Resumo Limpo

report basic design decis novel annot procedur under develop pathojen corpus medlin abstract annot patholog phenomena includ diseas proper subclass name entiti type known hard delin captur annot guidelin propos twocategori encod schema distinguish short long mention span first cover standard terminolog eg diseas latter account less structur descript statement normdevi state well criteria observ might signal patholog second design decis relat way annot instanc sampl subscrib activ learningbas approach known save annot cost without sacrif annot qualiti mean sampl bias design activ learn pick hard annot instanc human annot wherea easier one pass automat classifi whose model alreadi incorpor gradual improv previous annot experi

Resumos Similares

AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,745188794383795 )
J Am Med Inform Assoc - Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. ( 0,742452033817224 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,738168413560116 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,734041208485739 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,719131742754992 )
AMIA Annu Symp Proc - Automatically pairing measured findings across narrative abdomen CT reports. ( 0,716175380652101 )
J Biomed Inform - Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. ( 0,714024888059904 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,711199773406036 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,708347712226437 )
J Biomed Inform - Text de-identification for privacy protection: a study of its impact on clinical text information content. ( 0,701021784484892 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,696994113954303 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,693858368608685 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,691895986369222 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,691388492394293 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,689686543288405 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,689586409592352 )
J Am Med Inform Assoc - Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. ( 0,68885357690799 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,688049571398964 )
J Biomed Inform - A method for determining the number of documents needed for a gold standard corpus. ( 0,688029084770422 )
AMIA Annu Symp Proc - A Knowledge Intensive Approach to Mapping Clinical Narrative to LOINC. ( 0,687260977828546 )
Health Informatics J - University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports. ( 0,685676724112487 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,685561901769572 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,684985222306148 )
J Biomed Inform - The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. ( 0,683984723498543 )
J Biomed Inform - Degree centrality for semantic abstraction summarization of therapeutic studies. ( 0,682985538511707 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,682778838923213 )
J Biomed Inform - Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus. ( 0,680949759744038 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,68040165499364 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,674384814846023 )
J Biomed Inform - Text summarization in the biomedical domain: a systematic review of recent research. ( 0,671721908620531 )
J Am Med Inform Assoc - Induced lexico-syntactic patterns improve information extraction from online medical forums. ( 0,670203991547609 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,669973692828314 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,669820036582579 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,666918305473298 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,666700215452296 )
J Biomed Inform - Anaphoric reference in clinical reports: characteristics of an annotated corpus. ( 0,665667275871195 )
J Am Med Inform Assoc - MedXN: an open source medication extraction and normalization tool for clinical text. ( 0,664759910983782 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,664194179640814 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,663793857883963 )
J Biomed Inform - Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. ( 0,661865524823354 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,660317830026043 )
J Am Med Inform Assoc - Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. ( 0,659687920423778 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,659466672122616 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,658469408981018 )
BMC Med Inform Decis Mak - A framework for enhancing spatial and temporal granularity in report-based health surveillance systems. ( 0,658106123346552 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,657549644116325 )
AMIA Annu Symp Proc - Automated illustration of patients instructions. ( 0,657218387965581 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,656653339158636 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,654302551588492 )
J Am Med Inform Assoc - A rule based solution to co-reference resolution in clinical text. ( 0,653686069777946 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,6534046557841 )
J Biomed Inform - Ontology modularization to improve semantic medical image annotation. ( 0,653216663551112 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,651614379362623 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,651513128733764 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,651042049628183 )
J Am Med Inform Assoc - Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction. ( 0,650039253704309 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,649641266769074 )
J Biomed Inform - Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets. ( 0,649009809452882 )
J Am Med Inform Assoc - e-Measures: insight into the challenges and opportunities of automating publicly reported quality measures. ( 0,648023711097673 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,647928047688174 )
AMIA Annu Symp Proc - A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. ( 0,64714313455574 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,645577594291898 )
J Biomed Inform - Enhancing clinical concept extraction with distributional semantics. ( 0,644059825663726 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,642674991450267 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,641999103398615 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,641217713010147 )
J Am Med Inform Assoc - Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. ( 0,639843125153503 )
J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,638290493016424 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,638193553667048 )
J Biomed Inform - Annotating temporal information in clinical narratives. ( 0,637761380338451 )
AMIA Annu Symp Proc - Automated non-alphanumeric symbol resolution in clinical texts. ( 0,636879044436873 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,636187579533377 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,634954505889333 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,633377741803923 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,633318117479026 )
J Biomed Inform - Lessons learnt from the DDIExtraction-2013 Shared Task. ( 0,632013396203483 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,631732838102897 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,631407169823031 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,630832291902943 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,630819370569562 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,630201922857853 )
AMIA Annu Symp Proc - Hyperdimensional computing approach to word sense disambiguation. ( 0,630154529247502 )
AMIA Annu Symp Proc - Towards a semantic lexicon for clinical natural language processing. ( 0,628416632320421 )
J. Med. Internet Res. - Evaluating a web-based clinical decision support system for language disorders screening in a nursery school. ( 0,62840676733285 )
J Biomed Inform - Desiderata for ontologies to be used in semantic annotation of biomedical documents. ( 0,628088843149186 )
J Biomed Inform - A natural language processing pipeline for pairing measurements uniquely across free-text CT reports. ( 0,627752369093305 )
J. Med. Internet Res. - Developing a disease outbreak event corpus. ( 0,627622537118349 )
J Biomed Inform - Approaches to verb subcategorization for biomedicine. ( 0,625977760485123 )
J Chem Inf Model - Automated extraction of information on chemical-P-glycoprotein interactions from the literature. ( 0,625330401658864 )
J Med Syst - Experiences with a PDA-based documentation system in clinical research. ( 0,624629393084388 )
AMIA Annu Symp Proc - EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. ( 0,623960930287239 )
J Am Med Inform Assoc - Using machine learning for concept extraction on clinical documents from multiple data sources. ( 0,623725719956593 )
J Biomed Inform - Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study. ( 0,62338499620922 )
Sci Data - Building the graph of medicine from millions of clinical narratives. ( 0,622464044355178 )
Comput. Biol. Med. - Parsing citations in biomedical articles using conditional random fields. ( 0,62023911470805 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,618156708637755 )
J Am Med Inform Assoc - Automatic abstraction of imaging observations with their characteristics from mammography reports. ( 0,617918620125997 )
BMC Med Inform Decis Mak - The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. ( 0,617450576658433 )
BMC Med Inform Decis Mak - Mining biomarker information in biomedical literature. ( 0,615524573089945 )
J Am Med Inform Assoc - BoB, a best-of-breed automated text de-identification system for VHA clinical documents. ( 0,614434249775262 )