J Chem Inf Model - Improved chemical text mining of patents with infinite dictionaries and automatic spelling correction.

Tópicos

{ extract(1171) text(1153) clinic(932) }
{ compound(1573) activ(1297) structur(1058) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ can(981) present(881) function(850) }
{ imag(1057) registr(996) error(939) }
{ design(1359) user(1324) use(1319) }
{ search(2224) databas(1162) retriev(909) }
{ perform(999) metric(946) measur(919) }
{ time(1939) patient(1703) rate(768) }
{ process(1125) use(805) approach(778) }
{ network(2748) neural(1063) input(814) }
{ motion(1329) object(1292) video(1091) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ method(984) reconstruct(947) comput(926) }
{ gene(2352) biolog(1181) express(1162) }
{ measur(2081) correl(1212) valu(896) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ problem(2511) optim(1539) algorithm(950) }
{ learn(2355) train(1041) set(1003) }
{ data(1714) softwar(1251) tool(1186) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ state(1844) use(1261) util(961) }
{ patient(1821) servic(1111) care(1106) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The text mining of patents of pharmaceutical interest poses a number of unique challenges not encountered in other fields of text mining. Unlike fields, such as bioinformatics, where the number of terms of interest is enumerable and essentially static, systematic chemical nomenclature can describe an infinite number of molecules. Hence, the dictionary- and ontology-based techniques that are commonly used for gene names, diseases, species, etc., have limited utility when searching for novel therapeutic compounds in patents. Additionally, the length and the composition of IUPAC-like names make them more susceptible to typographic problems: OCR failures, human spelling errors, and hyphenation and line breaking issues. This work describes a novel technique, called CaffeineFix, designed to efficiently identify chemical names in free text, even in the presence of typographical errors. Corrected chemical names are generated as input for name-to-structure software. This forms a preprocessing pass, independent of the name-to-structure software used, and is shown to greatly improve the results of chemical text mining in our study.

Resumo Limpo

text mine patent pharmaceut interest pose number uniqu challeng encount field text mine unlik field bioinformat number term interest enumer essenti static systemat chemic nomenclatur can describ infinit number molecul henc dictionari ontologybas techniqu common use gene name diseas speci etc limit util search novel therapeut compound patent addit length composit iupaclik name make suscept typograph problem ocr failur human spell error hyphen line break issu work describ novel techniqu call caffeinefix design effici identifi chemic name free text even presenc typograph error correct chemic name generat input nametostructur softwar form preprocess pass independ nametostructur softwar use shown great improv result chemic text mine studi

Resumos Similares

BMC Med Inform Decis Mak - Dynamic summarization of bibliographic-based data. ( 0,643982337733613 )
IEEE Trans Pattern Anal Mach Intell - Toward Integrated Scene Text Reading. ( 0,641172513239744 )
J Chem Inf Model - HELM: a hierarchical notation language for complex biomolecule structure representation. ( 0,639164652366491 )
BMC Med Inform Decis Mak - Detecting causality from online psychiatric texts using inter-sentential language patterns. ( 0,617792107551947 )
J Biomed Inform - The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. ( 0,610955478116523 )
AMIA Annu Symp Proc - A machine learning approach for identifying anatomical locations of actionable findings in radiology reports. ( 0,601890032872838 )
J Chem Inf Model - Automating knowledge discovery for toxicity prediction using jumping emerging pattern mining. ( 0,595856479583184 )
J Chem Inf Model - Systematic assessment of compound series with SAR transfer potential. ( 0,590972169631177 )
J Am Med Inform Assoc - Anaphoric relations in the clinical narrative: corpus creation. ( 0,584782745805626 )
J Biomed Inform - Extraction of events and temporal expressions from clinical narratives. ( 0,582951164369121 )
J Biomed Inform - Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. ( 0,579291130549271 )
J Am Med Inform Assoc - A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. ( 0,575291075591992 )
Comput Math Methods Med - Ranking biomedical annotations with annotator's semantic relevancy. ( 0,571470997048981 )
J Chem Inf Model - Do not hesitate to use Tversky-and other hints for successful active analogue searches with feature count descriptors. ( 0,570581821852835 )
Comput Methods Programs Biomed - Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects. ( 0,570446828972836 )
J Chem Inf Model - Searching for substructures in fragment spaces. ( 0,569962374819484 )
J Chem Inf Model - From activity cliffs to activity ridges: informative data structures for SAR analysis. ( 0,569508409692365 )
AMIA Annu Symp Proc - Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. ( 0,56915091923089 )
AMIA Annu Symp Proc - The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. ( 0,568848888445986 )
J Chem Inf Model - A system for encoding and searching Markush structures. ( 0,567967283864312 )
J Biomed Inform - Ontology modularization to improve semantic medical image annotation. ( 0,567669128692515 )
J Biomed Inform - Mining connections between chemicals, proteins, and diseases extracted from Medline annotations. ( 0,567522624221983 )
J. Med. Internet Res. - Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. ( 0,567257403609107 )
AMIA Annu Symp Proc - It's about this and that: a description of anaphoric expressions in clinical text. ( 0,566941688042513 )
J Biomed Inform - Identifying non-elliptical entity mentions in a coordinated NP with ellipses. ( 0,565275529136951 )
J Am Med Inform Assoc - Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. ( 0,565123819751927 )
J Chem Inf Model - SAR monitoring of evolving compound data sets using activity landscapes. ( 0,564747882054735 )
J Chem Inf Model - Automated information extraction and structure-activity relationship analysis of cytochrome P450 substrates. ( 0,564504068718797 )
Comput Methods Programs Biomed - Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. ( 0,563538625385895 )
AMIA Annu Symp Proc - Parenthetically speaking: classifying the contents of parentheses for text mining. ( 0,562705856913468 )
J Biomed Inform - A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. ( 0,561528104655478 )
AMIA Annu Symp Proc - BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction. ( 0,560635105079474 )
J Chem Inf Model - Design of multitarget activity landscapes that capture hierarchical activity cliff distributions. ( 0,559983282692175 )
J Integr Bioinform - Automatic extraction of microorganisms and their habitats from free text using text mining workflows. ( 0,559796693498553 )
J Am Med Inform Assoc - A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. ( 0,559429309985959 )
AMIA Annu Symp Proc - Synonym, topic model and predicate-based query expansion for retrieving clinical documents. ( 0,558546351934937 )
Int J Med Inform - Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs. ( 0,558006874548635 )
J Biomed Inform - MedTime: a temporal information extraction system for clinical narratives. ( 0,55750887260531 )
J Biomed Inform - Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. ( 0,556410749768656 )
J Biomed Inform - Relation mining experiments in the pharmacogenomics domain. ( 0,556187603884439 )
J Chem Inf Model - Novel method for pharmacophore analysis by examining the joint pharmacophore space. ( 0,554832141723418 )
Comput. Biol. Med. - A P300-based brain computer interface system for words typing. ( 0,551209385771219 )
J Biomed Inform - Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. ( 0,551138274334512 )
Neural Comput - A neurocomputational approach to prepositional phrase attachment ambiguity resolution. ( 0,5493398027384 )
J Am Med Inform Assoc - Recommending MeSH terms for annotating biomedical articles. ( 0,548535691643811 )
J Am Med Inform Assoc - Assisted annotation of medical free text using RapTAT. ( 0,547808616591803 )
J Biomed Inform - Determining the difficulty of Word Sense Disambiguation. ( 0,547296870507994 )
AMIA Annu Symp Proc - Mapping annotations with textual evidence using an scLDA model. ( 0,543747638735207 )
J Chem Inf Model - Improving classical substructure-based virtual screening to handle extrapolation challenges. ( 0,543745884717862 )
J Chem Inf Model - Discovery of chemical compound groups with common structures by a network analysis approach (affinity prediction method). ( 0,543700446227952 )
J Chem Inf Model - INFERCNMR: a 13C NMR interpretive library search system. ( 0,543002609469425 )
J Am Med Inform Assoc - MITRE system for clinical assertion status classification. ( 0,542745176404001 )
J Chem Inf Model - DrugLogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. ( 0,54233511983854 )
J Chem Inf Model - Natural product-like virtual libraries: recursive atom-based enumeration. ( 0,541796846806601 )
Comput Methods Programs Biomed - BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. ( 0,541725090233643 )
J Chem Inf Model - Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. ( 0,541567327565778 )
J Chem Inf Model - Bioturbo similarity searching: combining chemical and biological similarity to discover structurally diverse bioactive molecules. ( 0,541435987019584 )
AMIA Annu Symp Proc - Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. ( 0,539524715424493 )
AMIA Annu Symp Proc - Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. ( 0,539392879417039 )
J Chem Inf Model - Reading PDB: perception of molecules from 3D atomic coordinates. ( 0,537701826626865 )
J Biomed Inform - Enhancing clinical concept extraction with distributional semantics. ( 0,537478576695422 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,537319907410829 )
AMIA Annu Symp Proc - Mining MEDLINE for problems associated with vitamin D. ( 0,53561304839785 )
J Am Med Inform Assoc - Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. ( 0,535116963747151 )
AMIA Annu Symp Proc - Sophia: A Expedient UMLS Concept Extraction Annotator. ( 0,535114997658848 )
J Am Med Inform Assoc - A hybrid system for temporal information extraction from clinical text. ( 0,534705457056779 )
Artif Intell Med - Biomedical events extraction using the hidden vector state model. ( 0,534449989428215 )
J Am Med Inform Assoc - An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. ( 0,534360683690836 )
J Biomed Inform - A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text. ( 0,534163385046293 )
AMIA Annu Symp Proc - Improving perceived and actual text difficulty for health information consumers using semi-automated methods. ( 0,533590362303103 )
J Chem Inf Model - Target-independent prediction of drug synergies using only drug lipophilicity. ( 0,533437517986811 )
Int J Med Inform - Detecting temporal expressions in medical narratives. ( 0,533235301532193 )
J Biomed Inform - The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. ( 0,533022648995331 )
Int J Med Inform - A methodology to enhance spatial understanding of disease outbreak events reported in news articles. ( 0,531333734782762 )
J Chem Inf Model - Automated extraction of information on chemical-P-glycoprotein interactions from the literature. ( 0,530409600825852 )
J Chem Inf Model - Visual characterization and diversity quantification of chemical libraries: 1. creation of delimited reference chemical subspaces. ( 0,530156222649427 )
J Chem Inf Model - Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. ( 0,529588477132091 )
J Chem Inf Model - Chemical and biological properties of frequent screening hits. ( 0,528362953955987 )
J Biomed Inform - Semantator: semantic annotator for converting biomedical text to linked data. ( 0,527270357027993 )
J Biomed Inform - Knowledge based word-concept model estimation and refinement for biomedical text mining. ( 0,527189004240884 )
AMIA Annu Symp Proc - Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature. ( 0,526650562320253 )
J Chem Inf Model - Library enhancement through the wisdom of crowds. ( 0,526619038572471 )
J Chem Inf Model - Identification of multitarget activity ridges in high-dimensional bioactivity spaces. ( 0,526066755864692 )
J Chem Inf Model - How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection. ( 0,52606409927602 )
Int J Med Inform - Detection of infectious symptoms from VA emergency department and primary care clinical documentation. ( 0,524514306460506 )
AMIA Annu Symp Proc - Detecting abbreviations in discharge summaries using machine learning methods. ( 0,524099892021133 )
J Chem Inf Model - admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties. ( 0,523075216866626 )
Brief. Bioinformatics - A survey on annotation tools for the biomedical literature. ( 0,522322064866701 )
Comput Methods Programs Biomed - Studying the properties of the updating coefficients in the OSEM algorithm for iterative image reconstruction in PET. ( 0,52231968727917 )
J Chem Inf Model - Discovery of novel histamine H4 and serotonin transporter ligands using the topological feature tree descriptor. ( 0,52231968727917 )
J Chem Inf Model - Identification of 1,2,5-oxadiazoles as a new class of SENP2 inhibitors using structure based virtual screening. ( 0,521646150199493 )
J Med Syst - Redactable signatures for signed CDA Documents. ( 0,520856062509834 )
Methods Inf Med - Adaptive semantic tag mining from heterogeneous clinical research texts. ( 0,520274349106119 )
AMIA Annu Symp Proc - Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. ( 0,519937161303089 )
J Am Med Inform Assoc - Temporal reasoning over clinical text: the state of the art. ( 0,519155264578704 )
AMIA Annu Symp Proc - A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. ( 0,51909098487068 )
J Chem Inf Model - Navigating high-dimensional activity landscapes: design and application of the ligand-target differentiation map. ( 0,519074269621969 )
AMIA Annu Symp Proc - TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes. ( 0,518718375844247 )
J Chem Inf Model - Identification of descriptors capturing compound class-specific features by mutual information analysis. ( 0,518702724562299 )
AMIA Annu Symp Proc - Evaluating the Importance of Image-related Text for Ad-hoc and Case-based Biomedical Article Retrieval. ( 0,518667374415014 )