J Chem Inf Model - Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods.

Tópicos

{ search(2224) databas(1162) retriev(909) }
{ perform(999) metric(946) measur(919) }
{ compound(1573) activ(1297) structur(1058) }
{ problem(2511) optim(1539) algorithm(950) }
{ visual(1396) interact(850) tool(830) }
{ method(2212) result(1239) propos(1039) }
{ method(1219) similar(1157) match(930) }
{ learn(2355) train(1041) set(1003) }
{ model(2341) predict(2261) use(1141) }
{ structur(1116) can(940) graph(676) }
{ use(1733) differ(960) four(931) }
{ imag(2830) propos(1344) filter(1198) }
{ take(945) account(800) differ(722) }
{ assess(1506) score(1403) qualiti(1306) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ data(3963) clinic(1234) research(1004) }
{ model(3480) simul(1196) paramet(876) }
{ gene(2352) biolog(1181) express(1162) }
{ result(1111) use(1088) new(759) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ howev(809) still(633) remain(590) }
{ import(1318) role(1303) understand(862) }
{ perform(1367) use(1326) method(1137) }
{ spatial(1525) area(1432) region(1030) }
{ sampl(1606) size(1419) use(1276) }
{ activ(1138) subject(705) human(624) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ implement(1333) system(1263) develop(1122) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

In ligand-based screening, retrosynthesis, and other chemoinformatics applications, one often seeks to search large databases of molecules in order to retrieve molecules that are similar to a given query. With the expanding size of molecular databases, the efficiency and scalability of data structures and algorithms for chemical searches are becoming increasingly important. Remarkably, both the chemoinformatics and information retrieval communities have converged on similar solutions whereby molecules or documents are represented by binary vectors, or fingerprints, indexing their substructures such as labeled paths for molecules and n-grams for text, with the same Jaccard-Tanimoto similarity measure. As a result, similarity search methods from one field can be adapted to the other. Here we adapt recent, state-of-the-art, inverted index methods from information retrieval to speed up similarity searches in chemoinformatics. Our results show a several-fold speed-up improvement over previous methods for both threshold searches and top-K searches. We also provide a mathematical analysis that allows one to predict the level of pruning achieved by the inverted index approach and validate the quality of these predictions through simulation experiments. All results can be replicated using data freely downloadable from http://cdb.ics.uci.edu/ .

Resumo Limpo

ligandbas screen retrosynthesi chemoinformat applic one often seek search larg databas molecul order retriev molecul similar given queri expand size molecular databas effici scalabl data structur algorithm chemic search becom increas import remark chemoinformat inform retriev communiti converg similar solut wherebi molecul document repres binari vector fingerprint index substructur label path molecul ngram text jaccardtanimoto similar measur result similar search method one field can adapt adapt recent stateoftheart invert index method inform retriev speed similar search chemoinformat result show severalfold speedup improv previous method threshold search topk search also provid mathemat analysi allow one predict level prune achiev invert index approach valid qualiti predict simul experi result can replic use data freeli download httpcdbicsuciedu

Resumos Similares

J Biomed Inform - A comparison of evaluation metrics for biomedical journals, articles, and websites in terms of sensitivity to topic. ( 0,760535664157491 )
J. Med. Internet Res. - Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical searches. ( 0,758880264615888 )
Methods Inf Med - Developing topic-specific search filters for PubMed with click-through data. ( 0,747873644927485 )
J Integr Bioinform - The LAILAPS search engine: relevance ranking in life science databases. ( 0,746557396737556 )
Telemed J E Health - MEDLINE versus EMBASE and CINAHL for telemedicine searches. ( 0,745824007830014 )
J Integr Bioinform - Classification methods for finding articles describing protein-protein interactions in PubMed. ( 0,741592860294454 )
J Biomed Inform - Supporting effective health and biomedical information retrieval and navigation: a novel facet view interface evaluation. ( 0,740882303293553 )
AMIA Annu Symp Proc - Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet. ( 0,738609231109411 )
J Integr Bioinform - The LAILAPS search engine: a feature model for relevance ranking in life science databases. ( 0,731395062001811 )
BMC Med Inform Decis Mak - Glomerular disease search filters for Pubmed, Ovid Medline, and Embase: a development and validation study. ( 0,729808491210916 )
Int J Health Geogr - HEALTH GeoJunction: place-time-concept browsing of health publications. ( 0,727953584902075 )
J. Med. Internet Res. - Development and validation of filters for the retrieval of studies of clinical examination from Medline. ( 0,725999616903631 )
Brief. Bioinformatics - Fast and efficient searching of biological data resources--using EB-eye. ( 0,719975656541976 )
AMIA Annu Symp Proc - Search filter precision can be improved by NOTing out irrelevant content. ( 0,719132276632694 )
AMIA Annu Symp Proc - Evaluation of automated term groupings for detecting anaphylactic shock signals for drugs. ( 0,717328185478896 )
J Am Med Inform Assoc - Search filters to identify geriatric medicine in Medline. ( 0,716077631339826 )
J Chem Inf Model - Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. ( 0,715115448344361 )
BMC Med Inform Decis Mak - BOSS: context-enhanced search for biomedical objects. ( 0,714298556521414 )
J Biomed Inform - On the query reformulation technique for effective MEDLINE document retrieval. ( 0,70937989766885 )
Methods Inf Med - Learning the preferences of physicians for the organization of result lists of medical evidence articles. ( 0,708309430200565 )
Health Info Libr J - Developing a geographic search filter to identify randomised controlled trials in Africa: finding the optimal balance between sensitivity and precision. ( 0,703392747758406 )
J Am Med Inform Assoc - A literature search tool for intelligent extraction of disease-associated genes. ( 0,701767770691816 )
J Biomed Inform - Development and evaluation of a biomedical search engine using a predicate-based vector space model. ( 0,701008355936365 )
J Biomed Inform - MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms. ( 0,700082997336517 )
J Am Med Inform Assoc - Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. ( 0,699991162899326 )
J Am Med Inform Assoc - Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters. ( 0,697644169244656 )
J. Med. Internet Res. - Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews. ( 0,697103803232795 )
BMC Med Inform Decis Mak - CDAPubMed: a browser extension to retrieve EHR-based biomedical literature. ( 0,696918877424126 )
Health Info Libr J - Medical literature searches: a comparison of PubMed and Google Scholar. ( 0,69681487529889 )
AMIA Annu Symp Proc - BIOSPIDA: A Relational Database Translator for NCBI. ( 0,695515039101382 )
J Integr Bioinform - On comparison of SimTandem with state-of-the-art peptide identification tools, efficiency of precursor mass filter and dealing with variable modifications. ( 0,695154528918099 )
J Am Med Inform Assoc - Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries. ( 0,691828093049083 )
Health Info Libr J - Facilitating access to evidence: Primary Health Care Search Filter. ( 0,68716491488692 )
Health Info Libr J - Utilisation of search filters in systematic reviews of prognosis questions. ( 0,686464944673334 )
J Telemed Telecare - How to improve your PubMed/MEDLINE searches: 1. background and basic searching. ( 0,68295793525876 )
J Biomed Inform - A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine. ( 0,681815942071487 )
J. Med. Internet Res. - A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French. ( 0,6808652623071 )
J Biomed Inform - A mutation-centric approach to identifying pharmacogenomic relations in text. ( 0,679951422675229 )
J. Med. Internet Res. - Definition of Health 2.0 and Medicine 2.0: a systematic review. ( 0,672866483469387 )
BMC Med Inform Decis Mak - Boolean versus ranked querying for biomedical systematic reviews. ( 0,671469115408043 )
J Integr Bioinform - A query suggestion workflow for life science IR-systems. ( 0,666521141208893 )
Int J Med Inform - An analysis of clinical queries in an electronic health record search utility. ( 0,6660127684099 )
BMC Med Inform Decis Mak - Performance evaluation of Unified Medical Language System?'s synonyms expansion to query PubMed. ( 0,664743748334968 )
Methods Inf Med - A survey on visual information search behavior and requirements of radiologists. ( 0,662503701834008 )
Health Info Libr J - Searching for randomised controlled trials and clinical controlled trials in Thai online bibliographical biomedical databases. ( 0,662193549935336 )
IEEE Trans Vis Comput Graph - WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search. ( 0,661341988045618 )
Brief. Bioinformatics - Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees. ( 0,660727829610752 )
J. Med. Internet Res. - Using Internet search engines to obtain medical information: a comparative study. ( 0,659045511044206 )
J Biomed Inform - Improving search over Electronic Health Records using UMLS-based query expansion through random walks. ( 0,656917361947077 )
J Am Med Inform Assoc - MEDLINE clinical queries are robust when searching in recent publishing years. ( 0,656500428540598 )
AMIA Annu Symp Proc - Using Co-Authoring and Cross-Referencing Information for MEDLINE Indexing. ( 0,654632510819193 )
BMC Med Inform Decis Mak - Publication trends of shared decision making in 15 high impact medical journals: a full-text review with bibliometric analysis. ( 0,651466617206726 )
Comput Methods Programs Biomed - METRADISC-XL: a program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays. ( 0,6512368032325 )
J Biomed Inform - Reflective random indexing for semi-automatic indexing of the biomedical literature. ( 0,651162547073628 )
Health Info Libr J - Assessment of indexing trends with specific and general terms for herbal medicine. ( 0,650975353723048 )
Health Info Libr J - Sensitivity and precision of adverse effects search filters in MEDLINE and EMBASE: a case study of fractures with thiazolidinediones. ( 0,650099680275081 )
Health Info Libr J - The performance of adverse effects search filters in MEDLINE and EMBASE. ( 0,647391525263217 )
J Med Syst - MIRASS: medical informatics research activity support system using information mashup network. ( 0,644457587347613 )
Inform Health Soc Care - Readability of online health information: implications for health literacy. ( 0,641155600668366 )
Int J Med Inform - MEDRank: using graph-based concept ranking to index biomedical texts. ( 0,641049872715317 )
AMIA Annu Symp Proc - A bottom-up approach to MEDLINE indexing recommendations. ( 0,640808246037421 )
BMC Med Inform Decis Mak - Dynamic summarization of bibliographic-based data. ( 0,639712839940923 )
J Chem Inf Model - ??C NMR-distance matrix descriptors: optimal abstract 3D space granularity for predicting estrogen binding. ( 0,639137040688388 )
Artif Intell Med - Adaptation of machine translation for multilingual information retrieval in the medical domain. ( 0,636113971196302 )
J Am Med Inform Assoc - Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure. ( 0,635688257529858 )
J Chem Inf Model - Large-scale similarity search profiling of ChEMBL compound data sets. ( 0,635204084976778 )
J Chem Inf Model - Chemical and biological properties of frequent screening hits. ( 0,634469738502525 )
J. Med. Internet Res. - Cumulative query method for influenza surveillance using search engine data. ( 0,633251809466874 )
IEEE Trans Pattern Anal Mach Intell - On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. ( 0,631841661593391 )
J Biomed Inform - Using statistical text mining to supplement the development of an ontology. ( 0,628263670791358 )
AMIA Annu Symp Proc - MeSH term explosion and author rank improve expert recommendations. ( 0,623213506167604 )
Health Info Libr J - Where and how to search for information on the effectiveness of public health interventions - a case study for prevention of cardiovascular disease. ( 0,620819778691495 )
AMIA Annu Symp Proc - Optimizing the txt2MEDLINE search portal for low-resource clinical decision support. ( 0,617844695929236 )
J Am Med Inform Assoc - A practical approach to achieve private medical record linkage in light of public resources. ( 0,615592953155106 )
AMIA Annu Symp Proc - Finding and accessing diagrams in biomedical publications. ( 0,614496666438329 )
Perspect Health Inf Manag - Risk factors for bladder cancer: challenges of conducting a literature search using PubMed. ( 0,614053601750082 )
J Chem Inf Model - Scaffold hopping by fragment replacement. ( 0,611830685222853 )
J. Med. Internet Res. - Automatic evidence retrieval for systematic reviews. ( 0,611120147846424 )
Methods Inf Med - Technology-induced errors. The current use of frameworks and models from the biomedical and life sciences literatures. ( 0,6076039960954 )
J Am Med Inform Assoc - Directing the public to evidence-based online content. ( 0,605087694423471 )
J Chem Inf Model - Do not hesitate to use Tversky-and other hints for successful active analogue searches with feature count descriptors. ( 0,604033676850659 )
J Biomed Inform - Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics. ( 0,60065401703922 )
J Am Med Inform Assoc - DCMDSM: a DICOM decomposed storage model. ( 0,598534316175017 )
J Am Med Inform Assoc - Design and usability study of an iconic user interface to ease information retrieval of medical guidelines. ( 0,597688785287102 )
J Am Med Inform Assoc - Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature. ( 0,597469179998967 )
Res Synth Methods - Comprehensive computer searches and reporting in systematic reviews. ( 0,597326616170343 )
Int J Med Inform - An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. ( 0,594629854584232 )
Int J Med Inform - FindZebra: a search engine for rare diseases. ( 0,59443009341371 )
AMIA Annu Symp Proc - Mining MEDLINE for problems associated with vitamin D. ( 0,591691571062402 )
J Am Med Inform Assoc - Federated queries of clinical data repositories: the sum of the parts does not equal the whole. ( 0,590840068784419 )
AMIA Annu Symp Proc - An automated approach for ranking journals to help in clinician decision support. ( 0,589017005951563 )
J. Med. Internet Res. - The impact of search engine selection and sorting criteria on vaccination beliefs and attitudes: two experiments manipulating Google output. ( 0,587863694062307 )
Res Synth Methods - Inquisitio validus Index Medicus: A simple method of validating MEDLINE systematic review searches. ( 0,587594926576002 )
AMIA Annu Symp Proc - Query log analysis of an electronic health record search engine. ( 0,586716089961515 )
Health Info Libr J - Can we prioritise which databases to search? A case study using a systematic review of frozen shoulder management. ( 0,585666110018759 )
Methods Inf Med - Chi-square-based scoring function for categorization of MEDLINE citations. ( 0,585368554551108 )
J Chem Inf Model - Identification of toxifying and detoxifying moieties for mutagenicity prediction by priority assessment. ( 0,584307011428666 )
J Am Med Inform Assoc - PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. ( 0,584162300580503 )
Res Synth Methods - Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. ( 0,582336853996326 )
AMIA Annu Symp Proc - Author keywords in biomedical journal articles. ( 0,581464423077562 )