J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.


{ extract(1171) text(1153) clinic(932) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(2212) result(1239) propos(1039) }
{ can(774) often(719) complex(702) }
{ patient(2315) diseas(1263) diabet(1191) }
{ assess(1506) score(1403) qualiti(1306) }
{ research(1218) medic(880) student(794) }
{ case(1353) use(1143) diagnosi(1136) }
{ first(2504) two(1366) second(1323) }
{ data(1737) use(1416) pattern(1282) }
{ general(901) number(790) one(736) }
{ perform(1367) use(1326) method(1137) }
{ monitor(1329) mobil(1314) devic(1160) }
{ sampl(1606) size(1419) use(1276) }
{ studi(2440) review(1878) systemat(933) }
{ result(1111) use(1088) new(759) }
{ inform(2794) health(2639) internet(1427) }
{ algorithm(1844) comput(1787) effici(935) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ search(2224) databas(1162) retriev(909) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ measur(2081) correl(1212) valu(896) }
{ network(2748) neural(1063) input(814) }
{ treatment(1704) effect(941) patient(846) }
{ import(1318) role(1303) understand(862) }
{ cost(1906) reduc(1198) effect(832) }
{ can(981) present(881) function(850) }
{ structur(1116) can(940) graph(676) }
{ imag(1947) propos(1133) code(1026) }
{ imag(1057) registr(996) error(939) }
{ featur(3375) classif(2383) classifi(1994) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ system(1050) medic(1026) inform(1018) }
{ spatial(1525) area(1432) region(1030) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ use(2086) technolog(871) perceiv(783) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ activ(1452) weight(1219) physic(1104) }
{ detect(2391) sensit(1101) algorithm(908) }
{ model(3404) distribut(989) bayesian(671) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }


CKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora.OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora.METHODS: To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations.RESULTS: The agreement between the crowd's annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task.CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower's quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches.

Resumo Limpo

ckground highqual gold standard vital supervis machin learningbas clinic natur languag process nlp system clinic nlp project expert annot tradit creat gold standard howev tradit annot expens timeconsum reduc cost annot general nlp project turn crowdsourc base web technolog involv submit smaller subtask coordin marketplac worker internet mani studi conduct area crowdsourc focus task general nlp field hand biomed domain usual base upon small pilot sampl size addit qualiti crowdsourc biomed nlp corpora never except compar traditionallydevelop gold standard previous report result medic name entiti annot task show fmeasur base agreement crowdsourc traditionallydevelop corporaobject build upon previous work general crowdsourc research studi investig usabl crowdsourc clinic nlp domain special emphasi achiev high agreement crowdsourc traditionallydevelop corporamethod build gold standard evalu crowdsourc worker perform clinic trial announc ctas clinicaltrialsgov websit random select doubl annot medic name medic type link attribut experi use crowdflow amazon mechan turkbas crowdsourc platform calcul sensit precis fmeasur evalu qualiti crowd work test statist signific p chisquar test detect differ crowdsourc traditionallydevelop annotationsresult agreement crowd annot traditionallygener corpora high annot fmeasur medic name medic type correct previous annot medic name medic type excel link medic attribut simpl vote provid best judgment aggreg approach statist signific differ crowd traditionallygener corpora result show improv previous report result medic name entiti annot taskconclus studi offer three contribut first prove crowdsourc feasibl inexpens fast practic approach collect highqual annot clinic text protect health inform exclud believ welldesign user interfac rigor qualiti control strategi entiti annot link critic success work second contribut internetbas crowdsourc field will public releas javascript crowdflow markup languag infrastructur code necessari util crowdflow qualiti control crowdsourc interfac name entiti annot final spur futur research will releas cta annot generat tradit crowdsourc approach

Resumos Similares

AMIA Annu Symp Proc - Automatically pairing measured findings across narrative abdomen CT reports. ( 0,910312325670732 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,909540033921111 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,90322439233213 )
J Biomed Inform - Towards generating a patient's timeline: extracting temporal relationships from clinical notes. ( 0,902310359630672 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,901973970768509 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,898895921826301 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,896551575526526 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,895004721999229 )
J Am Med Inform Assoc - Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text. ( 0,893330430753406 )
J Am Med Inform Assoc - Eventual situations for timeline extraction from clinical reports. ( 0,893141365519625 )
J Biomed Inform - Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. ( 0,888287350872173 )
J Biomed Inform - Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. ( 0,885750043805607 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,881038009570247 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,877187735089394 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,877148846226085 )
J Am Med Inform Assoc - Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. ( 0,875959138727967 )
Appl Clin Inform - Representation of information about family relatives as structured data in electronic health records. ( 0,872039482085811 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,870886275128646 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,869933529648463 )
J Am Med Inform Assoc - MedXN: an open source medication extraction and normalization tool for clinical text. ( 0,869504319758495 )
J Biomed Inform - Anaphoric reference in clinical reports: characteristics of an annotated corpus. ( 0,869182651970045 )
J Biomed Inform - Text de-identification for privacy protection: a study of its impact on clinical text information content. ( 0,868760427744565 )
J Am Med Inform Assoc - Automatic discourse connective detection in biomedical text. ( 0,865808907180498 )
AMIA Annu Symp Proc - Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining. ( 0,865659106327911 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,864704053401097 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,863161779354587 )
J Am Med Inform Assoc - A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. ( 0,860652097371059 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,859890375834342 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,855939434203936 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,854323585557411 )
J Biomed Inform - Identifying non-elliptical entity mentions in a coordinated NP with ellipses. ( 0,853968312171359 )
J Am Med Inform Assoc - Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. ( 0,853766404540644 )
AMIA Annu Symp Proc - Natural language processing for lines and devices in portable chest x-rays. ( 0,851040321896686 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,84644681889038 )
AMIA Annu Symp Proc - Building gold standard corpora for medical natural language processing tasks. ( 0,844458533724931 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,842315507207118 )
J Biomed Inform - A new clustering method for detecting rare senses of abbreviations in clinical notes. ( 0,839189520549038 )
J Biomed Inform - UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. ( 0,836681818807846 )
J Am Med Inform Assoc - Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. ( 0,835875228858017 )
J Biomed Inform - NCBI disease corpus: a resource for disease name recognition and concept normalization. ( 0,834930979904047 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,832932326674277 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,832591510897442 )
J Am Med Inform Assoc - The effect of word familiarity on actual and perceived text difficulty. ( 0,832352747990618 )
BMC Med Inform Decis Mak - Text summarization as a decision support aid. ( 0,832089900274061 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,829185213343559 )
J Biomed Inform - Text summarization in the biomedical domain: a systematic review of recent research. ( 0,828361243958803 )
J Biomed Inform - Extraction of events and temporal expressions from clinical narratives. ( 0,828120383857937 )
AMIA Annu Symp Proc - Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records. ( 0,828090573657904 )
J Am Med Inform Assoc - A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. ( 0,824464547097025 )
J Biomed Inform - Annotating temporal information in clinical narratives. ( 0,821165926808695 )
J Am Med Inform Assoc - 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. ( 0,819226325149269 )
J Biomed Inform - Lessons learnt from the DDIExtraction-2013 Shared Task. ( 0,818637807365853 )
AMIA Annu Symp Proc - Towards a semantic lexicon for clinical natural language processing. ( 0,817504268367217 )
J Am Med Inform Assoc - Towards comprehensive syntactic and semantic annotations of the clinical narrative. ( 0,81710961471846 )
AMIA Annu Symp Proc - Natural language processing to extract follow-up provider information from hospital discharge summaries. ( 0,816432620245081 )
AMIA Annu Symp Proc - Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing. ( 0,816378588361454 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,816305442323876 )
AMIA Annu Symp Proc - Automated illustration of patients instructions. ( 0,815322598591252 )
AMIA Annu Symp Proc - Sophia: A Expedient UMLS Concept Extraction Annotator. ( 0,814979230920825 )
AMIA Annu Symp Proc - A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. ( 0,811235247846441 )
J Am Med Inform Assoc - A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. ( 0,806401714446662 )
J Am Med Inform Assoc - Using rule-based natural language processing to improve disease normalization in biomedical text. ( 0,803793293864195 )
Perspect Health Inf Manag - A comparison of two approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records. ( 0,80342654161113 )
J Biomed Inform - Degree centrality for semantic abstraction summarization of therapeutic studies. ( 0,803265402395797 )
Int J Med Inform - Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs. ( 0,802427603126896 )
J Biomed Inform - Desiderata for ontologies to be used in semantic annotation of biomedical documents. ( 0,802375230070964 )
J Am Med Inform Assoc - Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. ( 0,801899185248023 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,801032449339879 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,799712544048472 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,799228687472796 )
AMIA Annu Symp Proc - Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. ( 0,79702731166799 )
BMC Med Inform Decis Mak - Detecting causality from online psychiatric texts using inter-sentential language patterns. ( 0,79553138798659 )
J Biomed Inform - Automatically extracting information needs from complex clinical questions. ( 0,795157668012816 )
AMIA Annu Symp Proc - It's about this and that: a description of anaphoric expressions in clinical text. ( 0,792437427682975 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,791769723021682 )
J Am Med Inform Assoc - Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. ( 0,791145643864917 )
J Integr Bioinform - Automatic extraction of microorganisms and their habitats from free text using text mining workflows. ( 0,790026336402314 )
J Biomed Inform - Approaches to verb subcategorization for biomedicine. ( 0,786585702615354 )
J Am Med Inform Assoc - A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. ( 0,785476963682707 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,782542728833664 )
J Am Med Inform Assoc - Automatic abstraction of imaging observations with their characteristics from mammography reports. ( 0,782507614867181 )
J Biomed Inform - Semantator: semantic annotator for converting biomedical text to linked data. ( 0,781421933610335 )
BMC Med Inform Decis Mak - The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. ( 0,77949065449614 )
AMIA Annu Symp Proc - Extracting patient demographics and personal medical information from online health forums. ( 0,776017784604056 )
J Am Med Inform Assoc - Extracting drug indication information from structured product labels using natural language processing. ( 0,775779028894577 )
J Biomed Inform - Ontology modularization to improve semantic medical image annotation. ( 0,775511215284069 )
AMIA Annu Symp Proc - Semantic processing to identify adverse drug event information from black box warnings. ( 0,775078372388089 )
AMIA Annu Symp Proc - Critical finding capture in the impression section of radiology reports. ( 0,773984712081356 )
J Biomed Inform - Ontology-guided feature engineering for clinical text classification. ( 0,773784833967199 )
Sci Data - Building the graph of medicine from millions of clinical narratives. ( 0,773541897501739 )
AMIA Annu Symp Proc - Developing a section labeler for clinical documents. ( 0,772428792717195 )
J Biomed Inform - Enhancing clinical concept extraction with distributional semantics. ( 0,772020360303059 )
J Biomed Inform - Using an ensemble system to improve concept extraction from clinical records. ( 0,771157022237402 )
AMIA Annu Symp Proc - Generalizability and comparison of automatic clinical text de-identification methods and resources. ( 0,770000339378495 )
J Am Med Inform Assoc - Induced lexico-syntactic patterns improve information extraction from online medical forums. ( 0,768500941612144 )
J Biomed Inform - The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. ( 0,764782227615179 )
J Am Med Inform Assoc - Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries. ( 0,764433396873884 )
J Biomed Inform - Relation mining experiments in the pharmacogenomics domain. ( 0,762113772344579 )
J Am Med Inform Assoc - Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. ( 0,761309965672238 )
AMIA Annu Symp Proc - A cloud-based approach to medical NLP. ( 0,760767532688613 )