J Chem Inf Model - Using information from historical high-throughput screens to predict active compounds.


{ compound(1573) activ(1297) structur(1058) }
{ model(2341) predict(2261) use(1141) }
{ data(3008) multipl(1320) sourc(1022) }
{ method(1219) similar(1157) match(930) }
{ perform(1367) use(1326) method(1137) }
{ can(774) often(719) complex(702) }
{ research(1218) medic(880) student(794) }
{ data(2317) use(1299) case(1017) }
{ system(1976) rule(880) can(841) }
{ model(2656) set(1616) predict(1553) }
{ featur(3375) classif(2383) classifi(1994) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ research(1085) discuss(1038) issu(1018) }
{ model(3404) distribut(989) bayesian(671) }
{ assess(1506) score(1403) qualiti(1306) }
{ algorithm(1844) comput(1787) effici(935) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ intervent(3218) particip(2042) group(1664) }
{ result(1111) use(1088) new(759) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ sequenc(1873) structur(1644) protein(1328) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ state(1844) use(1261) util(961) }
{ cost(1906) reduc(1198) effect(832) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }


Modern high-throughput screening (HTS) is a well-established approach for hit finding in drug discovery that is routinely employed in the pharmaceutical industry to screen more than a million compounds within a few weeks. However, as the industry shifts to more disease-relevant but more complex phenotypic screens, the focus has moved to piloting smaller but smarter chemically/biologically diverse subsets followed by an expansion around hit compounds. One standard method for doing this is to train a machine-learning (ML) model with the chemical fingerprints of the tested subset of molecules and then select the next compounds based on the predictions of this model. An alternative approach would be to take advantage of the wealth of bioactivity information contained in older (full-deck) screens using so-called HTS fingerprints, where each element of the fingerprint corresponds to the outcome of a particular assay, as input to machine-learning algorithms. We constructed HTS fingerprints using two collections of data: 93 in-house assays and 95 publicly available assays from PubChem. For each source, an additional set of 51 and 46 assays, respectively, was collected for testing. Three different ML methods, random forest (RF), logistic regression (LR), and na?ve Bayes (NB), were investigated for both the HTS fingerprint and a chemical fingerprint, Morgan2. RF was found to be best suited for learning from HTS fingerprints yielding area under the receiver operating characteristic curve (AUC) values >0.8 for 78% of the internal assays and enrichment factors at 5% (EF(5%)) >10 for 55% of the assays. The RF(HTS-fp) generally outperformed the LR trained with Morgan2, which was the best ML method for the chemical fingerprint, for the majority of assays. In addition, HTS fingerprints were found to retrieve more diverse chemotypes. Combining the two models through heterogeneous classifier fusion led to a similar or better performance than the best individual model for all assays. Further validation using a pair of in-house assays and data from a confirmatory screen--including a prospective set of around 2000 compounds selected based on our approach--confirmed the good performance. Thus, the combination of machine-learning with HTS fingerprints and chemical fingerprints utilizes information from both domains and presents a very promising approach for hit expansion, leading to more hits. The source code used with the public data is provided.

Resumo Limpo

modern highthroughput screen hts wellestablish approach hit find drug discoveri routin employ pharmaceut industri screen million compound within week howev industri shift diseaserelev complex phenotyp screen focus move pilot smaller smarter chemicallybiolog divers subset follow expans around hit compound one standard method train machinelearn ml model chemic fingerprint test subset molecul select next compound base predict model altern approach take advantag wealth bioactiv inform contain older fulldeck screen use socal hts fingerprint element fingerprint correspond outcom particular assay input machinelearn algorithm construct hts fingerprint use two collect data inhous assay public avail assay pubchem sourc addit set assay respect collect test three differ ml method random forest rf logist regress lr nave bay nb investig hts fingerprint chemic fingerprint morgan rf found best suit learn hts fingerprint yield area receiv oper characterist curv auc valu intern assay enrich factor ef assay rfhtsfp general outperform lr train morgan best ml method chemic fingerprint major assay addit hts fingerprint found retriev divers chemotyp combin two model heterogen classifi fusion led similar better perform best individu model assay valid use pair inhous assay data confirmatori screeninclud prospect set around compound select base approachconfirm good perform thus combin machinelearn hts fingerprint chemic fingerprint util inform domain present promis approach hit expans lead hit sourc code use public data provid

Resumos Similares

J Chem Inf Model - Ligand-based virtual screening approach using a new scoring function. ( 0,812557699288009 )
J Chem Inf Model - Exploring polypharmacology using a ROCS-based target fishing approach. ( 0,802344379457398 )
J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. ( 0,787164990164189 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,761434873216053 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,75217628751658 )
J Chem Inf Model - Introduction of the conditional correlated Bernoulli model of similarity value distributions and its application to the prospective prediction of fingerprint search performance. ( 0,750908932948239 )
J Chem Inf Model - An integrated virtual screening approach for VEGFR-2 inhibitors. ( 0,748283295402678 )
J Chem Inf Model - Integrating medicinal chemistry, organic/combinatorial chemistry, and computational chemistry for the discovery of selective estrogen receptor modulators with Forecaster, a novel platform for drug discovery. ( 0,744136774192301 )
J Chem Inf Model - Capturing structure-activity relationships from chemogenomic spaces. ( 0,737113388421602 )
J Chem Inf Model - How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. ( 0,733572684277059 )
J Chem Inf Model - Compound optimization through data set-dependent chemical transformations. ( 0,730137371911037 )
J Chem Inf Model - Small-molecule 3D structure prediction using open crystallography data. ( 0,729141082631915 )
J Chem Inf Model - Conditional probabilistic analysis for prediction of the activity landscape and relative compound activities. ( 0,727890653763554 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,727084666081223 )
J Chem Inf Model - Enrichment of chemical libraries docked to protein conformational ensembles and application to aldehyde dehydrogenase 2. ( 0,722486867543346 )
J Chem Inf Model - Systematic assessment of compound series with SAR transfer potential. ( 0,721418902532207 )
J Chem Inf Model - Identification of novel malarial cysteine protease inhibitors using structure-based virtual screening of a focused cysteine protease inhibitor library. ( 0,714246030282838 )
J Chem Inf Model - Neighborhood-based prediction of novel active compounds from SAR matrices. ( 0,713860409588031 )
J Chem Inf Model - Design of multitarget activity landscapes that capture hierarchical activity cliff distributions. ( 0,713703212756391 )
J Chem Inf Model - SAR monitoring of evolving compound data sets using activity landscapes. ( 0,71267820041901 )
J Chem Inf Model - Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions. ( 0,711941915896065 )
J Chem Inf Model - DrugLogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. ( 0,709774528822836 )
J Chem Inf Model - Compound set enrichment: a novel approach to analysis of primary HTS data. ( 0,708726315408506 )
J Chem Inf Model - Selection of in silico drug screening results for G-protein-coupled receptors by using universal active probes. ( 0,707435534059648 )
J Chem Inf Model - Navigating high-dimensional activity landscapes: design and application of the ligand-target differentiation map. ( 0,706556507867198 )
J Chem Inf Model - Target-independent prediction of drug synergies using only drug lipophilicity. ( 0,706478212901638 )
J Chem Inf Model - Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure. ( 0,706361071053774 )
J Chem Inf Model - Combining horizontal and vertical substructure relationships in scaffold hierarchies for activity prediction. ( 0,70623831810768 )
J Chem Inf Model - Structural similarity based kriging for quantitative structure activity and property relationship modeling. ( 0,705640943166031 )
J Chem Inf Model - ColBioS-FlavRC: a collection of bioselective flavonoids and related compounds filtered from high-throughput screening outcomes. ( 0,705629193510993 )
J Chem Inf Model - Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. ( 0,70151546735342 )
J Chem Inf Model - Identification of a novel inhibitor of dengue virus protease through use of a virtual screening drug discovery Web portal. ( 0,700472449753074 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,699652081650257 )
Comput. Biol. Med. - CoMFA QSAR models of camptothecin analogues based on the distinctive SAR features of combined ABC, CD and E ring substitutions. ( 0,69926461969903 )
J Chem Inf Model - Identification of multitarget activity ridges in high-dimensional bioactivity spaces. ( 0,698942659095518 )
J Chem Inf Model - Ligand- and structure-based virtual screening for clathrodin-derived human voltage-gated sodium channel modulators. ( 0,697324728352425 )
J Chem Inf Model - Prediction of new bioactive molecules using a Bayesian belief network. ( 0,695776304924789 )
J Chem Inf Model - QSAR modeling of imbalanced high-throughput screening data in PubChem. ( 0,695610859972288 )
J Chem Inf Model - Increasing the coverage of medicinal chemistry-relevant space in commercial fragments screening. ( 0,694639771113922 )
J Chem Inf Model - AlzPlatform: an Alzheimer's disease domain-specific chemogenomics knowledgebase for polypharmacology and target identification research. ( 0,69341892515521 )
J Chem Inf Model - Locating sweet spots for screening hits and evaluating pan-assay interference filters from the performance analysis of two lead-like libraries. ( 0,693047808118471 )
J Chem Inf Model - In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. ( 0,692419001706069 )
J Chem Inf Model - Visualization and virtual screening of the chemical universe database GDB-17. ( 0,692218198842827 )
J Chem Inf Model - A multivariate chemical similarity approach to search for drugs of potential environmental concern. ( 0,691755545751785 )
J Chem Inf Model - From activity cliffs to activity ridges: informative data structures for SAR analysis. ( 0,691105798278463 )
J Chem Inf Model - Automated recycling of chemistry for virtual screening and library design. ( 0,689308943827294 )
J Chem Inf Model - Hsp90 inhibitors, part 2: combining ligand-based and structure-based approaches for virtual screening application. ( 0,688725108335607 )
J Chem Inf Model - Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects. ( 0,687476051665694 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,686439291150532 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,685687113858053 )
J Chem Inf Model - Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity. ( 0,683837164239635 )
J Chem Inf Model - Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. ( 0,682326182828891 )
J Chem Inf Model - Plane of best fit: a novel method to characterize the three-dimensionality of molecules. ( 0,681774163729636 )
J Chem Inf Model - Boosting virtual screening enrichments with data fusion: coalescing hits from two-dimensional fingerprints, shape, and docking. ( 0,681443903916748 )
J Chem Inf Model - Identification of descriptors capturing compound class-specific features by mutual information analysis. ( 0,680087627293327 )
J Chem Inf Model - Scanning structure-activity relationships with structure-activity similarity and related maps: from consensus activity cliffs to selectivity switches. ( 0,679829771272699 )
J Chem Inf Model - Bioturbo similarity searching: combining chemical and biological similarity to discover structurally diverse bioactive molecules. ( 0,679283824949876 )
J Chem Inf Model - Feasibility of using molecular docking-based virtual screening for searching dual target kinase inhibitors. ( 0,678725070375876 )
J Chem Inf Model - Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. ( 0,678214637685911 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,67763798300475 )
J Chem Inf Model - Target-specific support vector machine scoring in structure-based virtual screening: computational validation, in vitro testing in kinases, and effects on lung cancer cell proliferation. ( 0,677438521511061 )
J Chem Inf Model - Design of a three-dimensional multitarget activity landscape. ( 0,677390009400253 )
J Chem Inf Model - BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. ( 0,677300586312339 )
J Chem Inf Model - Systematic identification of scaffolds representing compounds active against individual targets and single or multiple target families. ( 0,673543210070053 )
J Chem Inf Model - SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition. ( 0,673528547987545 )
J Integr Bioinform - Database supported candidate search for metabolite identification. ( 0,673322462700619 )
J Chem Inf Model - Noncontiguous atom matching structural similarity function. ( 0,672861445902565 )
J Chem Inf Model - TIN-a combinatorial compound collection of synthetically feasible multicomponent synthesis products. ( 0,671378456839808 )
J Chem Inf Model - Identification of novel serotonin transporter compounds by virtual screening. ( 0,670720542156454 )
J Chem Inf Model - Polypharmacology directed compound data mining: identification of promiscuous chemotypes with different activity profiles and comparison to approved drugs. ( 0,670323490279695 )
J Chem Inf Model - Consensus ranking approach to understanding the underlying mechanism with QSAR. ( 0,669409909936902 )
J Chem Inf Model - Rapid scanning structure-activity relationships in combinatorial data sets: identification of activity switches. ( 0,669362160386867 )
J Chem Inf Model - Assessing molecular docking tools for relative biological activity prediction: a case study of triazole HIV-1 NNRTIs. ( 0,669261974529379 )
J Chem Inf Model - Visual characterization and diversity quantification of chemical libraries: 1. creation of delimited reference chemical subspaces. ( 0,669029907529218 )
J Chem Inf Model - Virtual drug screen schema based on multiview similarity integration and ranking aggregation. ( 0,668741859379437 )
J Chem Inf Model - Discovery of novel histamine H4 and serotonin transporter ligands using the topological feature tree descriptor. ( 0,668581574823131 )
J Chem Inf Model - Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. ( 0,668444026395999 )
J Chem Inf Model - Novel mycosin protease MycP1 inhibitors identified by virtual screening and 4D fingerprints. ( 0,668346029798628 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,668090592859825 )
J Chem Inf Model - Optimization of molecular representativeness. ( 0,667942367162311 )
J Chem Inf Model - Dual histamine H3R/serotonin 5-HT4R ligands with antiamnesic properties: pharmacophore-based virtual screening and polypharmacology. ( 0,667885223378428 )
J Chem Inf Model - Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing. ( 0,667742267903733 )
J Chem Inf Model - Large-scale assessment of activity landscape feature probabilities of bioactive compounds. ( 0,667017964552827 )
J Chem Inf Model - Hit expansion approaches using multiple similarity methods and virtualized query structures. ( 0,666991050702372 )
J Chem Inf Model - Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. ( 0,666949585902505 )
J Chem Inf Model - Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress. A response to the letter by the Cambridge Crystallographic Data Centre. ( 0,666830380063112 )
J Chem Inf Model - Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. ( 0,666523941327941 )
J Chem Inf Model - Ligand and decoy sets for docking to G protein-coupled receptors. ( 0,665695440592441 )
J Chem Inf Model - Natural product-like virtual libraries: recursive atom-based enumeration. ( 0,664854405575297 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,664553210729432 )
Comput Biol Chem - The optimization of running time for a maximum common substructure-based algorithm and its application in drug design. ( 0,664279162964161 )
J Chem Inf Model - G-protein coupled receptors virtual screening using genetic algorithm focused chemical space. ( 0,663783835897956 )
J Chem Inf Model - Mining the ChEMBL database: an efficient chemoinformatics workflow for assembling an ion channel-focused screening library. ( 0,663309446795292 )
J Chem Inf Model - De novo design of drug-like molecules by a fragment-based molecular evolutionary approach. ( 0,662826190600587 )
J Chem Inf Model - Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. ( 0,662053288312746 )
J Chem Inf Model - Scaffold diversity of exemplified medicinal chemistry space. ( 0,661607137479045 )
J Chem Inf Model - Identification of 1,2,5-oxadiazoles as a new class of SENP2 inhibitors using structure based virtual screening. ( 0,66040523853828 )
J Chem Inf Model - Discovery of novel checkpoint kinase 1 inhibitors by virtual screening based on multiple crystal structures. ( 0,660217467789497 )
J Chem Inf Model - Characterizing the diversity and biological relevance of the MLPCN assay manifold and screening set. ( 0,659570583677436 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,65906966321935 )