J Chem Inf Model - How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.


{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ research(1085) discuss(1038) issu(1018) }
{ activ(1452) weight(1219) physic(1104) }
{ measur(2081) correl(1212) valu(896) }
{ data(2317) use(1299) case(1017) }
{ structur(1116) can(940) graph(676) }
{ assess(1506) score(1403) qualiti(1306) }
{ method(2212) result(1239) propos(1039) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ can(774) often(719) complex(702) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ howev(809) still(633) remain(590) }
{ age(1611) year(1155) adult(843) }
{ use(1733) differ(960) four(931) }
{ data(1737) use(1416) pattern(1282) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ clinic(1479) use(1117) guidelin(835) }
{ model(2341) predict(2261) use(1141) }
{ spatial(1525) area(1432) region(1030) }
{ research(1218) medic(880) student(794) }
{ group(2977) signific(1463) compar(1072) }
{ first(2504) two(1366) second(1323) }
{ use(2086) technolog(871) perceiv(783) }
{ model(3404) distribut(989) bayesian(671) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ control(1307) perform(991) simul(935) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ model(3480) simul(1196) paramet(876) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ activ(1138) subject(705) human(624) }
{ analysi(2126) use(1163) compon(1037) }
{ use(976) code(926) identifi(902) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ method(1969) cluster(1462) data(1082) }


Chemical diversity is a widely applied approach to select structurally diverse subsets of molecules, often with the objective of maximizing the number of hits in biological screening. While many methods exist in the area, few systematic comparisons using current descriptors in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage is what the current study is aiming to address. In this work, 13 widely used molecular descriptors were compared, including fingerprint-based descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors (TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay of chemical structures (ROCS) and principal moments of inertia (PMI)), a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based descriptors (prop2D), and a more recently introduced molecular descriptor type (namely, "Bayes Affinity Fingerprints"). We assessed both the similar behavior of the descriptors in assessing the diversity of chemical libraries, and their ability to select compounds from libraries that are diverse in bioactivity space, which is a property of much practical relevance in screening library design. This is particularly evident, given that many future targets to be screened are not known in advance, but that the library should still maximize the likelihood of containing bioactive matter also for future screening campaigns. Overall, our results showed that descriptors based on atom topology (i.e., fingerprint-based descriptors and pharmacophore-based descriptors) correlate well in rank-ordering compounds, both within and between descriptor types. On the other hand, shape-based descriptors such as ROCS and PMI showed weak correlation with the other descriptors utilized in this study, demonstrating significantly different behavior. We then applied eight of the molecular descriptors compared in this study to sample a diverse subset of sample compounds (4%) from an initial population of 2587 compounds, covering the 25 largest human activity classes from ChEMBL and measured the coverage of activity classes by the subsets. Here, it was found that "Bayes Affinity Fingerprints" achieved an average coverage of 92% of activity classes. Using the descriptors ECFP4, GpiDAPH3, TGT, and random sampling, 91%, 84%, 84%, and 84% of the activity classes were represented in the selected compounds respectively, followed by BCUT, prop2D, MACCS, and PMI (in order of decreasing performance). In addition, we were able to show that there is no visible correlation between compound diversity in PMI space and in bioactivity space, despite frequent utilization of PMI plots to this end. To summarize, in this work, we assessed which descriptors select compounds with high coverage of bioactivity space, and can hence be used for diverse compound selection for biological screening. In cases where multiple descriptors are to be used for diversity selection, this work describes which descriptors behave complementarily, and can hence be used jointly to focus on different aspects of diversity in chemical space.

Resumo Limpo

chemic divers wide appli approach select structur divers subset molecul often object maxim number hit biolog screen mani method exist area systemat comparison use current descriptor particular object assess divers bioactiv space publish shortag current studi aim address work wide use molecular descriptor compar includ fingerprintbas descriptor ecfp fcfp macc key pharmacophorebas descriptor tat tad tgt tgd gpidaph shapebas descriptor rapid overlay chemic structur roc princip moment inertia pmi connectivitymatrixbas descriptor bcut physicochemicalpropertybas descriptor propd recent introduc molecular descriptor type name bay affin fingerprint assess similar behavior descriptor assess divers chemic librari abil select compound librari divers bioactiv space properti much practic relev screen librari design particular evid given mani futur target screen known advanc librari still maxim likelihood contain bioactiv matter also futur screen campaign overal result show descriptor base atom topolog ie fingerprintbas descriptor pharmacophorebas descriptor correl well rankord compound within descriptor type hand shapebas descriptor roc pmi show weak correl descriptor util studi demonstr signific differ behavior appli eight molecular descriptor compar studi sampl divers subset sampl compound initi popul compound cover largest human activ class chembl measur coverag activ class subset found bay affin fingerprint achiev averag coverag activ class use descriptor ecfp gpidaph tgt random sampl activ class repres select compound respect follow bcut propd macc pmi order decreas perform addit abl show visibl correl compound divers pmi space bioactiv space despit frequent util pmi plot end summar work assess descriptor select compound high coverag bioactiv space can henc use divers compound select biolog screen case multipl descriptor use divers select work describ descriptor behav complementarili can henc use joint focus differ aspect divers chemic space

Resumos Similares

J Chem Inf Model - Identification of novel malarial cysteine protease inhibitors using structure-based virtual screening of a focused cysteine protease inhibitor library. ( 0,94599108937041 )
J Chem Inf Model - Polypharmacology directed compound data mining: identification of promiscuous chemotypes with different activity profiles and comparison to approved drugs. ( 0,923979392518609 )
J Chem Inf Model - Locating sweet spots for screening hits and evaluating pan-assay interference filters from the performance analysis of two lead-like libraries. ( 0,922202494399705 )
J Chem Inf Model - Increasing the coverage of medicinal chemistry-relevant space in commercial fragments screening. ( 0,922198077403452 )
J Chem Inf Model - Combining horizontal and vertical substructure relationships in scaffold hierarchies for activity prediction. ( 0,913566724826811 )
J Chem Inf Model - Identification of 1,2,5-oxadiazoles as a new class of SENP2 inhibitors using structure based virtual screening. ( 0,911552444007213 )
J Chem Inf Model - Ligand- and structure-based virtual screening for clathrodin-derived human voltage-gated sodium channel modulators. ( 0,907033416221031 )
J Chem Inf Model - Natural product-like virtual libraries: recursive atom-based enumeration. ( 0,906108392116288 )
Curr Comput Aided Drug Des - Development of Chemical Compound Libraries for In Silico Drug Screening. ( 0,902141097994382 )
J Chem Inf Model - TIN-a combinatorial compound collection of synthetically feasible multicomponent synthesis products. ( 0,901796846477129 )
J Chem Inf Model - Automated recycling of chemistry for virtual screening and library design. ( 0,901066477484496 )
J Chem Inf Model - A multivariate chemical similarity approach to search for drugs of potential environmental concern. ( 0,900221824076887 )
J Chem Inf Model - G-protein coupled receptors virtual screening using genetic algorithm focused chemical space. ( 0,899221347056703 )
J Chem Inf Model - Target-independent prediction of drug synergies using only drug lipophilicity. ( 0,89895564851941 )
J Chem Inf Model - Compound optimization through data set-dependent chemical transformations. ( 0,898326527290192 )
J Chem Inf Model - In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. ( 0,896952113247673 )
J Chem Inf Model - Identification of a novel inhibitor of dengue virus protease through use of a virtual screening drug discovery Web portal. ( 0,896399899430836 )
J Chem Inf Model - Capturing structure-activity relationships from chemogenomic spaces. ( 0,892916018880108 )
J Chem Inf Model - Scaffold diversity of exemplified medicinal chemistry space. ( 0,892392906446528 )
J Chem Inf Model - Conditional probabilistic analysis for prediction of the activity landscape and relative compound activities. ( 0,891492820540634 )
J Chem Inf Model - From activity cliffs to activity ridges: informative data structures for SAR analysis. ( 0,889096455979528 )
J Chem Inf Model - Identification of novel liver X receptor activators by structure-based modeling. ( 0,888890067968452 )
J Chem Inf Model - Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity. ( 0,888106906147775 )
J Chem Inf Model - Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. ( 0,888034274574616 )
J Chem Inf Model - Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects. ( 0,8880097202991 )
J Chem Inf Model - Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. ( 0,887743019009779 )
J Chem Inf Model - Mining the ChEMBL database: an efficient chemoinformatics workflow for assembling an ion channel-focused screening library. ( 0,887547091136157 )
J Chem Inf Model - Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. ( 0,88420621200429 )
J Chem Inf Model - Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. ( 0,884087465691336 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,882885433231254 )
J Chem Inf Model - Prediction of individual compounds forming activity cliffs using emerging chemical patterns. ( 0,882821911430049 )
J Chem Inf Model - Identification of multitarget activity ridges in high-dimensional bioactivity spaces. ( 0,881746447957734 )
J Chem Inf Model - BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. ( 0,881185405391022 )
J Chem Inf Model - Identification of novel serotonin transporter compounds by virtual screening. ( 0,880985221137195 )
J Chem Inf Model - Optimization of molecular representativeness. ( 0,878134552915369 )
J Chem Inf Model - Integrating medicinal chemistry, organic/combinatorial chemistry, and computational chemistry for the discovery of selective estrogen receptor modulators with Forecaster, a novel platform for drug discovery. ( 0,874967655309902 )
J Chem Inf Model - Novel method for pharmacophore analysis by examining the joint pharmacophore space. ( 0,874275378846538 )
J Chem Inf Model - Novel mycosin protease MycP1 inhibitors identified by virtual screening and 4D fingerprints. ( 0,873018369277663 )
J Chem Inf Model - Compound set enrichment: a novel approach to analysis of primary HTS data. ( 0,872262634597428 )
J Chem Inf Model - Introduction of target cliffs as a concept to identify and describe complex molecular selectivity patterns. ( 0,870059889468482 )
J Chem Inf Model - Discovery of novel checkpoint kinase 1 inhibitors by virtual screening based on multiple crystal structures. ( 0,868130608494863 )
J Chem Inf Model - Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure. ( 0,867503810298637 )
J Chem Inf Model - Molecular topology analysis of the differences between drugs, clinical candidate compounds, and bioactive molecules. ( 0,866584210202556 )
J Chem Inf Model - Extending the activity cliff concept: structural categorization of activity cliffs and systematic identification of different types of cliffs in the ChEMBL database. ( 0,866335209652 )
J Chem Inf Model - Selection of in silico drug screening results for G-protein-coupled receptors by using universal active probes. ( 0,865195028950421 )
J Chem Inf Model - Discovery of novel histamine H4 and serotonin transporter ligands using the topological feature tree descriptor. ( 0,865084655731514 )
J Chem Inf Model - De novo design of drug-like molecules by a fragment-based molecular evolutionary approach. ( 0,864396223061843 )
J Chem Inf Model - Characterizing the diversity and biological relevance of the MLPCN assay manifold and screening set. ( 0,86305416583209 )
J Chem Inf Model - Identification of sumoylation activating enzyme 1 inhibitors by structure-based virtual screening. ( 0,863034569515567 )
J Chem Inf Model - Visual characterization and diversity quantification of chemical libraries: 1. creation of delimited reference chemical subspaces. ( 0,86238695565551 )
J Chem Inf Model - Identification of a new class of FtsZ inhibitors by structure-based design and in vitro screening. ( 0,862329322149186 )
J Chem Inf Model - Scanning structure-activity relationships with structure-activity similarity and related maps: from consensus activity cliffs to selectivity switches. ( 0,862179812096074 )
J Chem Inf Model - Visualization and virtual screening of the chemical universe database GDB-17. ( 0,861933804830435 )
J Chem Inf Model - Automatic tailoring and transplanting: a practical method that makes virtual screening more useful. ( 0,860695968820834 )
J Chem Inf Model - Discovery of a7-nicotinic receptor ligands by virtual screening of the chemical universe database GDB-13. ( 0,860363265757779 )
J Chem Inf Model - Fighting high molecular weight in bioactive molecules with sub-pharmacophore-based virtual screening. ( 0,860351884853403 )
J Chem Inf Model - Feasibility of using molecular docking-based virtual screening for searching dual target kinase inhibitors. ( 0,860041595164596 )
J Chem Inf Model - Identification of compounds with potential antibacterial activity against Mycobacterium through structure-based drug screening. ( 0,858613564069077 )
J Chem Inf Model - Navigating high-dimensional activity landscapes: design and application of the ligand-target differentiation map. ( 0,85757245377211 )
J Chem Inf Model - Design of multitarget activity landscapes that capture hierarchical activity cliff distributions. ( 0,856193423157584 )
J Chem Inf Model - Design of a three-dimensional multitarget activity landscape. ( 0,853742742408899 )
J Chem Inf Model - Scaffold-focused virtual screening: prospective application to the discovery of TTK inhibitors. ( 0,852422466161237 )
J Chem Inf Model - Enrichment of chemical libraries docked to protein conformational ensembles and application to aldehyde dehydrogenase 2. ( 0,8490763225162 )
J Chem Inf Model - Structural similarity based kriging for quantitative structure activity and property relationship modeling. ( 0,848035515275896 )
J Chem Inf Model - Discovery of new selective human aldose reductase inhibitors through virtual screening multiple binding pocket conformations. ( 0,84652160381125 )
J Chem Inf Model - ColBioS-FlavRC: a collection of bioselective flavonoids and related compounds filtered from high-throughput screening outcomes. ( 0,846088629814505 )
J Chem Inf Model - Rationalizing the role of SAR tolerance for ligand-based virtual screening. ( 0,846003334710736 )
J Chem Inf Model - Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. ( 0,844875058748706 )
J Chem Inf Model - SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition. ( 0,843396418959328 )
J Chem Inf Model - Hsp90 inhibitors, part 2: combining ligand-based and structure-based approaches for virtual screening application. ( 0,842538952331536 )
J Chem Inf Model - Fragment-based lead discovery and design. ( 0,841605043657991 )
J Chem Inf Model - Neighborhood-based prediction of novel active compounds from SAR matrices. ( 0,84069332472308 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,839792733119198 )
J Chem Inf Model - Virtual fragment screening: discovery of histamine H3 receptor ligands using ligand-based and protein-based molecular fingerprints. ( 0,837917287909716 )
J Chem Inf Model - Virtual screening yields inhibitors of novel antifungal drug target, benzoate 4-monooxygenase. ( 0,83494526346798 )
J Chem Inf Model - SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. ( 0,833380272974614 )
J Chem Inf Model - Discovery of chemical compound groups with common structures by a network analysis approach (affinity prediction method). ( 0,833088504576973 )
J Chem Inf Model - Application of quantitative structure-activity relationship models of 5-HT1A receptor binding to virtual screening identifies novel and potent 5-HT1A ligands. ( 0,831181636447187 )
J Chem Inf Model - Multiple e-pharmacophore modeling, 3D-QSAR, and high-throughput virtual screening of hepatitis C virus NS5B polymerase inhibitors. ( 0,830385168155991 )
J Chem Inf Model - Harvesting classification trees for drug discovery. ( 0,830216156060361 )
J Chem Inf Model - Plane of best fit: a novel method to characterize the three-dimensionality of molecules. ( 0,829886492417183 )
J Chem Inf Model - Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. ( 0,829601458497041 )
J Chem Inf Model - Identification of novel potential antibiotics against Staphylococcus using structure-based drug screening targeting dihydrofolate reductase. ( 0,828817067601218 )
J Chem Inf Model - Freely available conformer generation methods: how good are they? ( 0,826075592024102 )
J Chem Inf Model - Identification of novel S-adenosyl-L-homocysteine hydrolase inhibitors through homology-model-based virtual screening, synthesis, and biological evaluation. ( 0,82529811208439 )
J Chem Inf Model - Systematic assessment of compound series with SAR transfer potential. ( 0,824920589389766 )
J Chem Inf Model - SAR monitoring of evolving compound data sets using activity landscapes. ( 0,824861793403068 )
J Chem Inf Model - FINDSITE(comb): a threading/structure-based, proteomic-scale virtual ligand screening approach. ( 0,823542187137227 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,82303334344396 )
J Chem Inf Model - Prediction of new bioactive molecules using a Bayesian belief network. ( 0,819013330229092 )
J Chem Inf Model - Bioturbo similarity searching: combining chemical and biological similarity to discover structurally diverse bioactive molecules. ( 0,817597380157121 )
Comput Biol Chem - The optimization of running time for a maximum common substructure-based algorithm and its application in drug design. ( 0,817288768785154 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,817248663034542 )
J Chem Inf Model - Prediction of activity cliffs using support vector machines. ( 0,816329383145374 )
J Chem Inf Model - Large-scale assessment of activity landscape feature probabilities of bioactive compounds. ( 0,815646951894061 )
J Chem Inf Model - Detailed computational study of the active site of the hepatitis C viral RNA polymerase to aid novel drug design. ( 0,814861680226016 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,814809714200307 )
J Chem Inf Model - Docking ligands into flexible and solvated macromolecules. 7. Impact of protein flexibility and water molecules on docking-based virtual screening accuracy. ( 0,814746161171772 )
J Chem Inf Model - Similarity searching for potent compounds using feature selection. ( 0,813422706678339 )
J Chem Inf Model - Rapid scanning structure-activity relationships in combinatorial data sets: identification of activity switches. ( 0,813216638017271 )