J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization.

Tópicos

{ extract(1171) text(1153) clinic(932) }
{ data(2317) use(1299) case(1017) }
{ cancer(2502) breast(956) screen(824) }
{ data(3963) clinic(1234) research(1004) }
{ studi(2440) review(1878) systemat(933) }
{ concept(1167) ontolog(924) domain(897) }
{ case(1353) use(1143) diagnosi(1136) }
{ result(1111) use(1088) new(759) }
{ visual(1396) interact(850) tool(830) }
{ sampl(1606) size(1419) use(1276) }
{ method(1219) similar(1157) match(930) }
{ import(1318) role(1303) understand(862) }
{ signal(2180) analysi(812) frequenc(800) }
{ activ(1138) subject(705) human(624) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ method(2212) result(1239) propos(1039) }
{ inform(2794) health(2639) internet(1427) }
{ take(945) account(800) differ(722) }
{ learn(2355) train(1041) set(1003) }
{ search(2224) databas(1162) retriev(909) }
{ model(2341) predict(2261) use(1141) }
{ state(1844) use(1261) util(961) }
{ health(1844) social(1437) communiti(874) }
{ drug(1928) target(777) effect(648) }
{ can(774) often(719) complex(702) }
{ network(2748) neural(1063) input(814) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ problem(2511) optim(1539) algorithm(950) }
{ clinic(1479) use(1117) guidelin(835) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ research(1218) medic(880) student(794) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ method(984) reconstruct(947) comput(926) }
{ howev(809) still(633) remain(590) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ use(1733) differ(960) four(931) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH?) or Online Mendelian Inheritance in Man (OMIM?). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different knowledge-based disease normalization methods with a best performance in F-measure of 63.7%. These results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks. The NCBI disease corpus, guidelines and other associated resources are available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/.

Resumo Limpo

inform encod natur languag biomed literatur public use effici reliabl way access analyz inform avail natur languag process text mine tool therefor essenti extract valuabl inform howev develop power high effect tool automat detect central biomed concept diseas condit avail annot corpora paper present diseas name concept annot ncbi diseas corpus collect pubm abstract fulli annot mention concept level serv research resourc biomed natur languag process communiti pubm abstract manual annot two annot diseas mention correspond concept medic subject head mesh onlin mendelian inherit man omim manual curat perform use pubtat allow use preannot prestep manual annot fourteen annot random pair differ annot discuss reach consensus two annot phase set high interannot agreement observ final result check annot rest corpus assur corpuswid consist public releas ncbi diseas corpus contain diseas mention map uniqu diseas concept link mesh identifi rest contain omim identifi abl link mention singl diseas concept rest describ combin concept order help research use corpus design test diseas identif method prepar corpus train test develop set demonstr util conduct benchmark experi compar three differ knowledgebas diseas normal method best perform fmeasur result show ncbi diseas corpus potenti signific improv stateoftheart diseas name recognit normal research provid highqual gold standard thus enabl develop machinelearn base approach task ncbi diseas corpus guidelin associ resourc avail httpwwwncbinlmnihgovcbbresearchdogandiseas

Resumos Similares

AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,863224691369963 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,854327796203391 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,844525010757907 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,84391041401982 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,843350865690534 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,84030312462994 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,839123926050761 )
J Biomed Inform - Text summarization in the biomedical domain: a systematic review of recent research. ( 0,839024031091182 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,838358071702215 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,837810649234156 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,837806871414596 )
AMIA Annu Symp Proc - Automatically pairing measured findings across narrative abdomen CT reports. ( 0,83495659764475 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,834930979904047 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,834192220283118 )
J Biomed Inform - Anaphoric reference in clinical reports: characteristics of an annotated corpus. ( 0,833060326009457 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,832694346835921 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,832200765809663 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,829698306112991 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,825280654472557 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,825253314517825 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,825082041215702 )
Health Informatics J - University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports. ( 0,824278462173472 )
J Biomed Inform - Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets. ( 0,823532186105996 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,822305169375554 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,818407940974355 )
AMIA Annu Symp Proc - Towards a semantic lexicon for clinical natural language processing. ( 0,81832686802754 )
J Am Med Inform Assoc - Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. ( 0,817135178298986 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,816900134858156 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,811582291802665 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,80898384519342 )
Perspect Health Inf Manag - A comparison of two approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records. ( 0,806050443291096 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,801836522705649 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,800515990102293 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,798627567479256 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,797866263736158 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,79775994072483 )
J. Med. Internet Res. - Developing a disease outbreak event corpus. ( 0,796706245455348 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,796665665811443 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,795612444360102 )
J Am Med Inform Assoc - MedXN: an open source medication extraction and normalization tool for clinical text. ( 0,79308831323494 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,7912547174919 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,791072673973925 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,788018151400426 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,787992848488464 )
J Biomed Inform - Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. ( 0,786366463403611 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,786324075312502 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,785280832307809 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,785226122641969 )
J Biomed Inform - Desiderata for ontologies to be used in semantic annotation of biomedical documents. ( 0,78486737179364 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,784609511067268 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,784532097742366 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,78213264591294 )
BMC Med Inform Decis Mak - Text summarization as a decision support aid. ( 0,779308438277152 )
J Biomed Inform - Ontology modularization to improve semantic medical image annotation. ( 0,776958477302268 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,775862710637939 )
J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,775019378408276 )
J Am Med Inform Assoc - Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. ( 0,773001952292275 )
AMIA Annu Symp Proc - A Knowledge Intensive Approach to Mapping Clinical Narrative to LOINC. ( 0,770960604642331 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,770400138741001 )
J Am Med Inform Assoc - Vaccine adverse event text mining system for extracting features from vaccine safety reports. ( 0,767523792862599 )
J Biomed Inform - Text de-identification for privacy protection: a study of its impact on clinical text information content. ( 0,765825672362393 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,761778112913072 )
J Biomed Inform - Extraction of events and temporal expressions from clinical narratives. ( 0,760320332487268 )
J Am Med Inform Assoc - Automatic abstraction of imaging observations with their characteristics from mammography reports. ( 0,759289006309857 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,759042985952411 )
J Am Med Inform Assoc - The effect of word familiarity on actual and perceived text difficulty. ( 0,758431197613249 )
AMIA Annu Symp Proc - Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. ( 0,755482922603538 )
J Biomed Inform - Lessons learnt from the DDIExtraction-2013 Shared Task. ( 0,755439431150242 )
J Biomed Inform - Identifying non-elliptical entity mentions in a coordinated NP with ellipses. ( 0,755339632155495 )
J Biomed Inform - Enhancing clinical concept extraction with distributional semantics. ( 0,753633639446727 )
J Biomed Inform - Degree centrality for semantic abstraction summarization of therapeutic studies. ( 0,753034025427268 )
Sci Data - Building the graph of medicine from millions of clinical narratives. ( 0,752774052426308 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,752467089583619 )
J Biomed Inform - Annotating temporal information in clinical narratives. ( 0,750748488240406 )
J Biomed Inform - Semantator: semantic annotator for converting biomedical text to linked data. ( 0,750688418823924 )
J Am Med Inform Assoc - A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. ( 0,750174816317865 )
AMIA Annu Symp Proc - Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. ( 0,749587453129468 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,748630212253476 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,747653991458862 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,747379762873599 )
J Am Med Inform Assoc - Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries. ( 0,74641081124648 )
J Am Med Inform Assoc - Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. ( 0,746298625807711 )
Int J Med Inform - Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs. ( 0,745529224831935 )
J Am Med Inform Assoc - Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. ( 0,745473752071854 )
J Integr Bioinform - Automatic extraction of microorganisms and their habitats from free text using text mining workflows. ( 0,745002915197927 )
AMIA Annu Symp Proc - A cloud-based approach to medical NLP. ( 0,744336092294981 )
J Biomed Inform - Common data model for natural language processing based on two existing standard information models: CDA+GrAF. ( 0,742204506973765 )
BMC Med Inform Decis Mak - The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. ( 0,74072498194836 )
AMIA Annu Symp Proc - Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing. ( 0,739033186341028 )
AMIA Annu Symp Proc - A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. ( 0,737766323105917 )
AMIA Annu Symp Proc - Extracting patient demographics and personal medical information from online health forums. ( 0,736690656935326 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,736231070666695 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,735885152444237 )
AMIA Annu Symp Proc - Active Learning-based corpus annotation--the PathoJen experience. ( 0,734041208485739 )
J Biomed Inform - Approaches to verb subcategorization for biomedicine. ( 0,732838923815615 )
J Biomed Inform - Secondary use of electronic health records for building cohort studies through top-down information extraction. ( 0,731328441012128 )
AMIA Annu Symp Proc - Critical finding capture in the impression section of radiology reports. ( 0,728734117778876 )
AMIA Annu Symp Proc - Sophia: A Expedient UMLS Concept Extraction Annotator. ( 0,727679947936162 )
J Biomed Inform - Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. ( 0,726302350170311 )
J Biomed Inform - Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. ( 0,724647039576451 )