J Chem Inf Model - Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors.

Tópicos

{ method(1969) cluster(1462) data(1082) }
{ compound(1573) activ(1297) structur(1058) }
{ use(1733) differ(960) four(931) }
{ learn(2355) train(1041) set(1003) }
{ model(3404) distribut(989) bayesian(671) }
{ method(1219) similar(1157) match(930) }
{ use(2086) technolog(871) perceiv(783) }
{ howev(809) still(633) remain(590) }
{ import(1318) role(1303) understand(862) }
{ perform(1367) use(1326) method(1137) }
{ sampl(1606) size(1419) use(1276) }
{ problem(2511) optim(1539) algorithm(950) }
{ method(1557) propos(1049) approach(1037) }
{ can(981) present(881) function(850) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ featur(1941) imag(1645) propos(1176) }
{ data(2317) use(1299) case(1017) }
{ intervent(3218) particip(2042) group(1664) }
{ use(976) code(926) identifi(902) }
{ activ(1452) weight(1219) physic(1104) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ algorithm(1844) comput(1787) effici(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ visual(1396) interact(850) tool(830) }
{ model(3480) simul(1196) paramet(876) }
{ research(1218) medic(880) student(794) }
{ medic(1828) order(1363) alert(1069) }
{ first(2504) two(1366) second(1323) }
{ activ(1138) subject(705) human(624) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ method(2212) result(1239) propos(1039) }

Resumo

Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors such as molecular representation, mathematical method, algorithmical technique, and statistical distribution of data. For this reason, introduction and comparison of new methods are necessary in order to find the model that best fits the problem at hand. Earlier comparative studies report on Ward's algorithm using fingerprints for molecular description as generally superior in this field. However, problems still remain, i.e., other types of numerical descriptions have been little exploited, current descriptors selection strategy is trial and error-driven, and no previous comparative studies considering a broader domain of the combinatorial methods in grouping chemoinformatic data sets have been conducted. In this work, a comparison between combinatorial methods is performed,with five of them being novel in cheminformatics. The experiments are carried out using eight data sets that are well established and validated in the medical chemistry literature. Each drug data set was represented by real molecular descriptors selected by machine learning techniques, which are consistent with the neighborhood principle. Statistical analysis of the results demonstrates that pharmacological activities of the eight data sets can be modeled with a few of families with 2D and 3D molecular descriptors, avoiding classification problems associated with the presence of nonrelevant features. Three out of five of the proposed cluster algorithms show superior performance over most classical algorithms and are similar (or slightly superior in the most optimistic sense) to Ward's algorithm. The usefulness of these algorithms is also assessed in a comparative experiment to potent QSAR and machine learning classifiers, where they perform similarly in some cases.

Resumo Limpo

cluster algorithm play import role divers relat task modern chemoinformat widest applic pharmaceut industri drug discoveri program perform group strategi depend various factor molecular represent mathemat method algorithm techniqu statist distribut data reason introduct comparison new method necessari order find model best fit problem hand earlier compar studi report ward algorithm use fingerprint molecular descript general superior field howev problem still remain ie type numer descript littl exploit current descriptor select strategi trial errordriven previous compar studi consid broader domain combinatori method group chemoinformat data set conduct work comparison combinatori method performedwith five novel cheminformat experi carri use eight data set well establish valid medic chemistri literatur drug data set repres real molecular descriptor select machin learn techniqu consist neighborhood principl statist analysi result demonstr pharmacolog activ eight data set can model famili d d molecular descriptor avoid classif problem associ presenc nonrelev featur three five propos cluster algorithm show superior perform classic algorithm similar slight superior optimist sens ward algorithm use algorithm also assess compar experi potent qsar machin learn classifi perform similar case

Resumos Similares

J Chem Inf Model - Consensus methods for combining multiple clusterings of chemical structures. ( 0,83865337684351 )
J Chem Inf Model - Investigation of the use of spectral clustering for the analysis of molecular data. ( 0,828571009652418 )
J. Comput. Biol. - A geometric clustering algorithm with applications to structural data. ( 0,821698570042923 )
Int J Health Geogr - Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: a case study on murine typhus in south Texas. ( 0,78095288567868 )
Int J Health Geogr - A binary-based approach for detecting irregularly shaped clusters. ( 0,770297717490247 )
J Chem Inf Model - Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. ( 0,770165002681849 )
J Chem Inf Model - Benchmark data sets for structure-based computational target prediction. ( 0,763843528800916 )
IEEE Trans Pattern Anal Mach Intell - A Link-Based Approach to the Cluster Ensemble Problem. ( 0,759705080558157 )
Int J Health Geogr - Detecting activity locations from raw GPS data: a novel kernel-based algorithm. ( 0,749411324292551 )
Spat Spatiotemporal Epidemiol - Optimal selection of the spatial scan parameters for cluster detection: a simulation study. ( 0,748557306845922 )
Int J Health Geogr - Detection of clusters of a rare disease over a large territory: performance of cluster detection methods. ( 0,729069276273935 )
Comput. Biol. Med. - A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients. ( 0,728524428565697 )
J Chem Inf Model - Activity-aware clustering of high throughput screening data and elucidation of orthogonal structure-activity relationships. ( 0,727782471324175 )
Med Decis Making - Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. ( 0,727779812883762 )
AMIA Annu Symp Proc - Using hierarchical mixture of experts model for fusion of outbreak detection methods. ( 0,719493222120325 )
IEEE Trans Neural Netw Learn Syst - Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. ( 0,702423511586849 )
J Chem Inf Model - Toward a better pharmacophore description of P-glycoprotein modulators, based on macrocyclic diterpenes from Euphorbia species. ( 0,691553674488176 )
IEEE Trans Vis Comput Graph - GPU-based Multilevel Clustering. ( 0,688505281016082 )
J Chem Inf Model - Visualization of molecular fingerprints. ( 0,684898030920992 )
IEEE Trans Pattern Anal Mach Intell - Semi-Supervised Kernel Mean Shift Clustering. ( 0,679509753974605 )
J Med Syst - Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets. ( 0,678759796747651 )
Neural Comput - Spontaneous clustering via minimum -divergence. ( 0,671535103437757 )
Int J Health Geogr - Using statistical methods and genotyping to detect tuberculosis outbreaks. ( 0,670708643570874 )
J Chem Inf Model - Optimization of molecular representativeness. ( 0,669354649676879 )
Int J Neural Syst - A genetic graph-based approach for partitional clustering. ( 0,669249598134813 )
J Integr Bioinform - Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. ( 0,663559074806482 )
Med Decis Making - Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials. ( 0,661370061319778 )
Med Biol Eng Comput - A mathematical method for constraint-based cluster analysis towards optimized constrictive diameter smoothing of saphenous vein grafts. ( 0,655884959140916 )
J Integr Bioinform - An evolutionary and visual framework for clustering of DNA microarray data. ( 0,653607394660716 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,652143348838425 )
Comput Methods Programs Biomed - Fuzzy and hard clustering analysis for thyroid disease. ( 0,639549124705793 )
J. Comput. Biol. - Biological cluster evaluation for gene function prediction. ( 0,639146092406136 )
J Biomed Inform - Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. ( 0,639142755586089 )
Artif Intell Med - Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection. ( 0,638462220600851 )
J Chem Inf Model - Novel method for pharmacophore analysis by examining the joint pharmacophore space. ( 0,636824002532587 )
J Biomed Inform - Learning Bayesian networks from survival data using weighting censored instances. ( 0,63589237440648 )
J Chem Inf Model - Algorithm for reaction classification. ( 0,633603180269451 )
J Chem Inf Model - In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Na?ve Bayes and Parzen-Rosenblatt window. ( 0,632824946473973 )
Artif Intell Med - Missing data imputation using statistical and machine learning methods in a real breast cancer problem. ( 0,631994562651185 )
Neural Comput - A nonparametric clustering algorithm with a quantile-based likelihood estimator. ( 0,631371720185005 )
Comput Math Methods Med - A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation. ( 0,63058135006993 )
Comput Math Methods Med - Decimative spectral estimation with unconstrained model order. ( 0,627487400409885 )
AMIA Annu Symp Proc - Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles. ( 0,623871624921351 )
J. Comput. Biol. - Fast geometric consensus approach for protein model quality assessment. ( 0,619817289563867 )
J. Med. Internet Res. - Security analysis and improvements to the PsychoPass method. ( 0,618797709002273 )
IEEE J Biomed Health Inform - Red blood cell cluster separation from digital images for use in sickle cell disease. ( 0,616304591597033 )
J Am Med Inform Assoc - Privacy-preserving heterogeneous health data sharing. ( 0,614033499319422 )
J Chem Inf Model - Enrichment analysis for discovering biological associations in phenotypic screens. ( 0,612676534152769 )
J Chem Inf Model - Library enhancement through the wisdom of crowds. ( 0,611984320008363 )
Health Info Libr J - A bibliometric approach demonstrates the impact of a social care data set on research and policy. ( 0,611046712565586 )
J Biomed Inform - Extension of the survival dimensionality reduction algorithm to detect epistasis in competing risks models (SDR-CR). ( 0,610875814579536 )
BMC Med Inform Decis Mak - Efficient algorithms for fast integration on large data sets from multiple sources. ( 0,606261056601463 )
Brief. Bioinformatics - A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. ( 0,60451024842124 )
J Biomed Inform - Quantifying the determinants of outbreak detection performance through simulation and machine learning. ( 0,604183664982817 )
J Chem Inf Model - 3D molecular descriptors important for clinical success. ( 0,603716791762544 )
Comput. Aided Surg. - The Equidistant Method - a novel hip joint simulation algorithm for detection of femoroacetabular impingement. ( 0,60327895809392 )
J Chem Inf Model - Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. ( 0,603026181738205 )
Brief. Bioinformatics - Data construction for phosphorylation site prediction. ( 0,602175698344662 )
J Chem Inf Model - Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure. ( 0,602089624498917 )
Comput. Biol. Med. - Evaluation of automatic feature detection algorithms in EEG: application to interburst intervals. ( 0,601400066927117 )
AMIA Annu Symp Proc - Survival prediction and treatment recommendation with Bayesian techniques in lung cancer. ( 0,601336696373891 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,600917633204911 )
IEEE Trans Image Process - Linear discriminant analysis based on L1-norm maximization. ( 0,596007255521895 )
Comput Math Methods Med - Feature selection for better identification of subtypes of Guillain-Barr? syndrome. ( 0,594880250118379 )
J Chem Inf Model - String kernels and high-quality data set for improved prediction of kinked helices in a-helical membrane proteins. ( 0,59474868744644 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,591180725067621 )
J Chem Inf Model - Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity. ( 0,59013160456538 )
Comput Math Methods Med - White blood cell segmentation by circle detection using electromagnetism-like optimization. ( 0,586813173710581 )
J Chem Inf Model - Structural similarity based kriging for quantitative structure activity and property relationship modeling. ( 0,583778396857625 )
Comput Methods Programs Biomed - Development and application of efficient pathway enumeration algorithms for metabolic engineering applications. ( 0,583283068063834 )
J Chem Inf Model - Automated selection of compounds with physicochemical properties to maximize bioavailability and druglikeness. ( 0,583183726260982 )
Comput Math Methods Med - Novel harmonic regularization approach for variable selection in Cox's proportional hazards model. ( 0,58229217840026 )
J Chem Inf Model - Hit expansion approaches using multiple similarity methods and virtualized query structures. ( 0,582158750532637 )
Comput Biol Chem - piClust: a density based piRNA clustering algorithm. ( 0,578914117518035 )
Comput Math Methods Med - A new particle swarm optimization-based method for phase unwrapping of MRI data. ( 0,576634777256617 )
Spat Spatiotemporal Epidemiol - Performance of cancer cluster Q-statistics for case-control residential histories. ( 0,576564673139107 )
Comput. Biol. Med. - A straightforward approach to computer-aided polyp detection using a polyp-specific volumetric feature in CT colonography. ( 0,573354743251864 )
J Chem Inf Model - Prospects for tertiary structure prediction of RNA based on secondary structure information. ( 0,571952485510262 )
J Chem Inf Model - Discovery of chemical compound groups with common structures by a network analysis approach (affinity prediction method). ( 0,569407190879625 )
IEEE Trans Image Process - A Geometric Framework for Rectangular Shape Detection. ( 0,566825377941867 )
Comput. Biol. Med. - An evolutionary approach for searching metabolic pathways. ( 0,565938389785117 )
IEEE Trans Image Process - Enhancing Low-Rank Subspace Clustering by Manifold Regularization. ( 0,565463081417475 )
Brief. Bioinformatics - Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data. ( 0,563774930630574 )
J. Comput. Biol. - Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data. ( 0,563043451065185 )
IEEE Trans Pattern Anal Mach Intell - Multi-Exemplar Affinity Propagation. ( 0,559660981201906 )
Med Biol Eng Comput - Detection of swallows with silent aspiration using swallowing and breath sound analysis. ( 0,557787631482825 )
J Chem Inf Model - Stereo signature molecular descriptor. ( 0,557750505057411 )
J Chem Inf Model - How different are two chemical structures? ( 0,557153248774087 )
Int J Comput Assist Radiol Surg - Fast lung nodule detection in chest CT images using cylindrical nodule-enhancement filter. ( 0,555007186259475 )
Comput Biol Chem - Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. ( 0,554470278266538 )
J Chem Inf Model - Similarity searching for potent compounds using feature selection. ( 0,554431461032914 )
Comput. Biol. Med. - Analysis of adductors angle measurement in Hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. ( 0,55248797753008 )
J Chem Inf Model - G-protein coupled receptors virtual screening using genetic algorithm focused chemical space. ( 0,552335347273197 )
IEEE Trans Image Process - In-plane rotation and scale invariant clustering using dictionaries. ( 0,552069430614567 )
AMIA Annu Symp Proc - Patient clustering with uncoded text in electronic medical records. ( 0,549205736787921 )
BMC Med Inform Decis Mak - The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks. ( 0,546735993439904 )
J Chem Inf Model - Using novel descriptor accounting for ligand-receptor interactions to define and visually explore biologically relevant chemical space. ( 0,545623019349459 )
J Integr Bioinform - Parallel Niche Pareto AlineaGA--an evolutionary multiobjective approach on multiple sequence alignment. ( 0,544665489134777 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,54422350988702 )
Comput. Biol. Med. - CAM: a web tool for combining array CGH and microarray gene expression data from multiple samples. ( 0,542803024368667 )