J Chem Inf Model - Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

Tópicos

{ data(3008) multipl(1320) sourc(1022) }
{ method(1219) similar(1157) match(930) }
{ compound(1573) activ(1297) structur(1058) }
{ use(1733) differ(960) four(931) }
{ model(2341) predict(2261) use(1141) }
{ system(1976) rule(880) can(841) }
{ search(2224) databas(1162) retriev(909) }
{ group(2977) signific(1463) compar(1072) }
{ learn(2355) train(1041) set(1003) }
{ general(901) number(790) one(736) }
{ structur(1116) can(940) graph(676) }
{ framework(1458) process(801) describ(734) }
{ imag(1947) propos(1133) code(1026) }
{ featur(3375) classif(2383) classifi(1994) }
{ take(945) account(800) differ(722) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ perform(1367) use(1326) method(1137) }
{ sampl(1606) size(1419) use(1276) }
{ first(2504) two(1366) second(1323) }
{ health(1844) social(1437) communiti(874) }
{ method(2212) result(1239) propos(1039) }
{ model(3404) distribut(989) bayesian(671) }
{ measur(2081) correl(1212) valu(896) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ studi(2440) review(1878) systemat(933) }
{ extract(1171) text(1153) clinic(932) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ studi(1119) effect(1106) posit(819) }
{ data(2317) use(1299) case(1017) }
{ cost(1906) reduc(1198) effect(832) }
{ activ(1138) subject(705) human(624) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The concept of data fusion - the combination of information from different sources describing the same object with the expectation to generate a more accurate representation - has found application in a very broad range of disciplines. In the context of ligand-based virtual screening (VS), data fusion has been applied to combine knowledge from either different active molecules or different fingerprints to improve similarity search performance. Machine-learning (ML) methods based on fusion of multiple homogeneous classifiers, in particular random forests, have also been widely applied in the ML literature. The heterogeneous version of classifier fusion - fusing the predictions from different model types - has been less explored. Here, we investigate heterogeneous classifier fusion for ligand-based VS using three different ML methods, RF, na?ve Bayes (NB), and logistic regression (LR), with four 2D fingerprints, atom pairs, topological torsions, RDKit fingerprint, and circular fingerprint. The methods are compared using a previously developed benchmarking platform for 2D fingerprints which is extended to ML methods in this article. The original data sets are filtered for difficulty, and a new set of challenging data sets from ChEMBL is added. Data sets were also generated for a second use case: starting from a small set of related actives instead of diverse actives. The final fused model consistently outperforms the other approaches across the broad variety of targets studied, indicating that heterogeneous classifier fusion is a very promising approach for ligand-based VS. The new data sets together with the adapted source code for ML methods are provided in the Supporting Information .

Resumo Limpo

concept data fusion combin inform differ sourc describ object expect generat accur represent found applic broad rang disciplin context ligandbas virtual screen vs data fusion appli combin knowledg either differ activ molecul differ fingerprint improv similar search perform machinelearn ml method base fusion multipl homogen classifi particular random forest also wide appli ml literatur heterogen version classifi fusion fuse predict differ model type less explor investig heterogen classifi fusion ligandbas vs use three differ ml method rf nave bay nb logist regress lr four d fingerprint atom pair topolog torsion rdkit fingerprint circular fingerprint method compar use previous develop benchmark platform d fingerprint extend ml method articl origin data set filter difficulti new set challeng data set chembl ad data set also generat second use case start small set relat activ instead divers activ final fuse model consist outperform approach across broad varieti target studi indic heterogen classifi fusion promis approach ligandbas vs new data set togeth adapt sourc code ml method provid support inform

Resumos Similares

J Chem Inf Model - Consensus models of activity landscapes with multiple chemical, conformer, and property representations. ( 0,732962717559192 )
J Chem Inf Model - Using information from historical high-throughput screens to predict active compounds. ( 0,667742267903732 )
J Chem Inf Model - Virtual drug screen schema based on multiview similarity integration and ranking aggregation. ( 0,650229574097356 )
IEEE Trans Image Process - Multimodal graph-based reranking for web image search. ( 0,648118799602702 )
J Biomed Inform - Complementary ensemble clustering of biomedical data. ( 0,603083189745529 )
J Chem Inf Model - Analysis of commercial and public bioactivity databases. ( 0,599635141238098 )
AMIA Annu Symp Proc - Managing Medical Vocabulary Updates in a Clinical Data Warehouse: An RxNorm Case Study. ( 0,588581038136212 )
J Integr Bioinform - Quality controls in integrative approaches to detect errors and inconsistencies in biological databases. ( 0,576550108096084 )
J Chem Inf Model - An integrated virtual screening approach for VEGFR-2 inhibitors. ( 0,572883427945468 )
J Biomed Inform - Practical approach to determine sample size for building logistic prediction models using high-throughput data. ( 0,57168814778038 )
Comput Biol Chem - Kernel-based data fusion improves the drug-protein interaction prediction. ( 0,56971126812922 )
J Biomed Inform - Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: a case study. ( 0,569453470717089 )
Spat Spatiotemporal Epidemiol - Comparing spatio-temporal clusters of arthropod-borne infections using administrative medical claims and state reported surveillance data. ( 0,568080408069896 )
J Chem Inf Model - Determination of toxicant mode of action by augmented top priority fragment class. ( 0,565633501370183 )
AMIA Annu Symp Proc - Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies. ( 0,565423681748871 )
J Chem Inf Model - Rapid scanning structure-activity relationships in combinatorial data sets: identification of activity switches. ( 0,562440260138002 )
J Chem Inf Model - Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. ( 0,550695353618088 )
J Chem Inf Model - Histone deacetylase inhibitors: structure-based modeling and isoform-selectivity prediction. ( 0,549920385281872 )
J Chem Inf Model - Assessing the confidence level of public domain compound activity data and the impact of alternative potency measurements on SAR analysis. ( 0,548455610441636 )
J Biomed Inform - Transfer learning of classification rules for biomarker discovery and verification from molecular profiling studies. ( 0,546721051112841 )
Appl Clin Inform - The false security of blind dates: chrononymization's lack of impact on data privacy of laboratory data. ( 0,543082053798483 )
IEEE Trans Image Process - Image search reranking with query-dependent click-based relevance feedback. ( 0,542720896290266 )
J Chem Inf Model - Build-up algorithm for atomic correspondence between chemical structures. ( 0,542595257420419 )
J Med Syst - Federated querying architecture with clinical & translational health IT application. ( 0,540696385952077 )
J Chem Inf Model - Exploiting structural information in patent specifications for key compound prediction. ( 0,540659374575779 )
BMC Med Inform Decis Mak - Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach. ( 0,537650486068638 )
Med Decis Making - Multicohort models in cost-effectiveness analysis: why aggregating estimates over multiple cohorts can hide useful information. ( 0,536739011362875 )
J Chem Inf Model - CSAR benchmark exercise of 2010: selection of the protein-ligand complexes. ( 0,532492517103429 )
AMIA Annu Symp Proc - Using RxNorm for cross-institutional formulary data normalization within a distributed grid-computing environment. ( 0,530592402353125 )
J Chem Inf Model - Multiobjective particle swarm optimization: automated identification of structure-activity relationship-informative compounds with favorable physicochemical property distributions. ( 0,530036553128105 )
Artif Intell Med - Combining image, voice, and the patient's questionnaire data to categorize laryngeal disorders. ( 0,529418249136826 )
J Chem Inf Model - Single R-Group Polymorphisms (SRPs) and R-cliffs: an intuitive framework for analyzing and visualizing activity cliffs in a single analog series. ( 0,528411988549472 )
Sci Data - Global integrated drought monitoring and prediction system. ( 0,525258888582412 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,523904703222504 )
J Biomed Inform - Limestone: high-throughput candidate phenotype generation via tensor factorization. ( 0,52061772217406 )
J Integr Bioinform - A generic organ based ontology system, applied to vertebrate heart anatomy, development and physiology. ( 0,520429048943587 )
IEEE Trans Image Process - Saliency detection by multitask sparsity pursuit. ( 0,52012656749317 )
J Chem Inf Model - Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions. ( 0,520124758891094 )
J Am Med Inform Assoc - Pharmacogenomics in the pocket of every patient? A prototype based on quick response codes. ( 0,519328309897212 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,518860347332408 )
J Am Med Inform Assoc - Using the CER Hub to ensure data quality in a multi-institution smoking cessation study. ( 0,5159985047319 )
Comput Methods Programs Biomed - Computer simulation of the activity of the elderly person living independently in a Health Smart Home. ( 0,515191342861587 )
AMIA Annu Symp Proc - Expressing observations from electronic medical record flowsheets in an i2b2 based clinical data repository to support research and quality improvement. ( 0,513406018271829 )
IEEE Trans Image Process - A marked point process for modeling lidar waveforms. ( 0,512147316609312 )
J Chem Inf Model - Dependence of QSAR models on the selection of trial descriptor sets: a demonstration using nanotoxicity endpoints of decorated nanotubes. ( 0,511213918072992 )
Comput. Biol. Med. - Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches. ( 0,511163281877947 )
J Chem Inf Model - Dual histamine H3R/serotonin 5-HT4R ligands with antiamnesic properties: pharmacophore-based virtual screening and polypharmacology. ( 0,511137601952603 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,509996654491193 )
J Chem Inf Model - Neighborhood-based prediction of novel active compounds from SAR matrices. ( 0,509715757576473 )
IEEE Trans Vis Comput Graph - SuperMatching: Feature Matching Using Supersymmetric Geometric Constraints. ( 0,509125194770248 )
AMIA Annu Symp Proc - Can prospective usability evaluation predict data errors? ( 0,508499567645783 )
J Chem Inf Model - Improving similarity-driven library design: customized matching and regioselective feature trees. ( 0,507692629368621 )
IEEE Trans Image Process - Circular reranking for visual search. ( 0,505947982584702 )
Methods Inf Med - Comparison of validity of mapping between drug indications and ICD-10. Direct and indirect terminology based approaches. ( 0,504782349456141 )
J Chem Inf Model - Large-scale similarity search profiling of ChEMBL compound data sets. ( 0,504455403851772 )
J Chem Inf Model - Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. ( 0,502824545532429 )
J Biomed Inform - The Equity in Prescription Medicines Use Study: using community pharmacy databases to study medicines utilisation. ( 0,501804166405114 )
Brief. Bioinformatics - Social pathway annotation: extensions of the systems biology metabolic modelling assistant. ( 0,501520793203442 )
IEEE Trans Neural Netw Learn Syst - Incremental Generalized Discriminative Common Vectors for Image Classification. ( 0,501276728747136 )
J Biomed Inform - DSGeo: software tools for cross-platform analysis of gene expression data in GEO. ( 0,499540866323551 )
J Chem Inf Model - In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Na?ve Bayes and Parzen-Rosenblatt window. ( 0,499340020731265 )
AMIA Annu Symp Proc - Learning medical diagnosis models from multiple experts. ( 0,498059406780214 )
J Chem Inf Model - Ligand-based target prediction with signature fingerprints. ( 0,496132172483718 )
IEEE Trans Image Process - Multivariate slow feature analysis and decorrelation filtering for blind source separation. ( 0,495933979251082 )
J Chem Inf Model - Hit expansion approaches using multiple similarity methods and virtualized query structures. ( 0,494798909081816 )
J Biomed Inform - The linked medical data access control framework. ( 0,493824712412603 )
J Chem Inf Model - Conformer generation with OMEGA: learning from the data set and the analysis of failures. ( 0,492592718871779 )
J Chem Inf Model - Noncontiguous atom matching structural similarity function. ( 0,491751664904778 )
J Am Med Inform Assoc - A study in transfer learning: leveraging data from multiple hospitals to enhance hospital-specific predictions. ( 0,491073509125233 )
BMC Med Inform Decis Mak - The effect of data cleaning on record linkage quality. ( 0,491038358674858 )
IEEE Trans Image Process - Robust weighted graph transformation matching for rigid and nonrigid image registration. ( 0,489578342356686 )
Artif Intell Med - Multiple kernel learning in protein-protein interaction extraction from biomedical literature. ( 0,48741059414627 )
J Biomed Inform - Source authenticity in the UMLS--a case study of the Minimal Standard Terminology. ( 0,4866183875375 )
J Am Med Inform Assoc - Federated queries of clinical data repositories: the sum of the parts does not equal the whole. ( 0,48412101028073 )
Brief. Bioinformatics - Batch effect removal methods for microarray gene expression data integration: a survey. ( 0,482416101168427 )
Med Biol Eng Comput - A mathematical method for constraint-based cluster analysis towards optimized constrictive diameter smoothing of saphenous vein grafts. ( 0,480778399062953 )
IEEE Trans Image Process - Hyperspectral BSS using GMCA with spatio-spectral sparsity constraints. ( 0,480597264758694 )
J Chem Inf Model - Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening. ( 0,480183246337527 )
Res Synth Methods - Exploration of heterogeneity in distributed research network drug safety analyses. ( 0,476821295148074 )
J Clin Monit Comput - Design and implementation of a hospital wide waveform capture system. ( 0,476099687405174 )
J Chem Inf Model - MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. ( 0,475298629104195 )
J Chem Inf Model - COSMOsim3D: 3D-similarity and alignment based on COSMO polarization charge densities. ( 0,474665940840825 )
J Chem Inf Model - CLCA: maximum common molecular substructure queries within the MetRxn database. ( 0,474304046380172 )
J Chem Inf Model - Integrating ligand-based and protein-centric virtual screening of kinase inhibitors using ensembles of multiple protein kinase genes and conformations. ( 0,47301081257221 )
J Chem Inf Model - Visual characterization and diversity quantification of chemical libraries: 2. Analysis and selection of size-independent, subspace-specific diversity indices. ( 0,472818202528741 )
J Chem Inf Model - A Bayesian approach to in silico blood-brain barrier penetration modeling. ( 0,472457898190497 )
Methods Inf Med - Health data cooperatives - citizen empowerment. ( 0,472386349079083 )
IEEE Trans Image Process - Fast wavelet-based image characterization for highly adaptive image retrieval. ( 0,471912451424898 )
AMIA Annu Symp Proc - Improving Clinical Data Integrity by using Data Adjudication Techniques for Data Received through a Health Information Exchange (HIE). ( 0,466203681365008 )
J Integr Bioinform - BioNetLink - an architecture for working with network data. ( 0,465062041457515 )
IEEE Trans Pattern Anal Mach Intell - Iterative Discovery of Multiple Alternative Clustering Views. ( 0,464543157382168 )
J. Comput. Biol. - CORaL: comparison of ranked lists for analysis of gene expression data. ( 0,463866898412976 )
J. Comput. Biol. - Separating significant matches from spurious matches in DNA sequences. ( 0,46344436787257 )
J Chem Inf Model - SHAFTS: a hybrid approach for 3D molecular similarity calculation. 1. Method and assessment of virtual screening. ( 0,462429740530206 )
IEEE Trans Pattern Anal Mach Intell - Unsupervised Adaptation Across Domain Shifts By Generating Intermediate Data Representations. ( 0,462211705030505 )
J Biomed Inform - Clustering clinical models from local electronic health records based on semantic similarity. ( 0,461716173047815 )
J Chem Inf Model - Design of multitarget activity landscapes that capture hierarchical activity cliff distributions. ( 0,4613034367319 )
J Chem Inf Model - ReverseScreen3D: a structure-based ligand matching method to identify protein targets. ( 0,460898457858463 )
J Chem Inf Model - Note on naive Bayes based on binary descriptors in cheminformatics. ( 0,460430609347024 )
IEEE Trans Image Process - Retina verification system based on biometric graph matching. ( 0,460010517633436 )