J Chem Inf Model - Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ perform(999) metric(946) measur(919) }
{ problem(2511) optim(1539) algorithm(950) }
{ detect(2391) sensit(1101) algorithm(908) }
{ assess(1506) score(1403) qualiti(1306) }
{ concept(1167) ontolog(924) domain(897) }
{ learn(2355) train(1041) set(1003) }
{ general(901) number(790) one(736) }
{ compound(1573) activ(1297) structur(1058) }
{ process(1125) use(805) approach(778) }
{ imag(1947) propos(1133) code(1026) }
{ sampl(1606) size(1419) use(1276) }
{ result(1111) use(1088) new(759) }
{ algorithm(1844) comput(1787) effici(935) }
{ perform(1367) use(1326) method(1137) }
{ signal(2180) analysi(812) frequenc(800) }
{ estim(2440) model(1874) function(577) }
{ can(774) often(719) complex(702) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ howev(809) still(633) remain(590) }
{ model(3480) simul(1196) paramet(876) }
{ data(3008) multipl(1320) sourc(1022) }
{ use(1733) differ(960) four(931) }
{ imag(2830) propos(1344) filter(1198) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ model(2341) predict(2261) use(1141) }
{ studi(1119) effect(1106) posit(819) }
{ spatial(1525) area(1432) region(1030) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ time(1939) patient(1703) rate(768) }
{ health(1844) social(1437) communiti(874) }
{ decis(3086) make(1611) patient(1517) }
{ model(3404) distribut(989) bayesian(671) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

The goal of this study was to adapt a recently proposed linear large-scale support vector machine to large-scale binary cheminformatics classification problems and to assess its performance on various benchmarks using virtual screening performance measures. We extended the large-scale linear support vector machine library LIBLINEAR with state-of-the-art virtual high-throughput screening metrics to train classifiers on whole large and unbalanced data sets. The formulation of this linear support machine has an excellent performance if applied to high-dimensional sparse feature vectors. An additional advantage is the average linear complexity in the number of non-zero features of a prediction. Nevertheless, the approach assumes that a problem is linearly separable. Therefore, we conducted an extensive benchmarking to evaluate the performance on large-scale problems up to a size of 175000 samples. To examine the virtual screening performance, we determined the chemotype clusters using Feature Trees and integrated this information to compute weighted AUC-based performance measures and a leave-cluster-out cross-validation. We also considered the BEDROC score, a metric that was suggested to tackle the early enrichment problem. The performance on each problem was evaluated by a nested cross-validation and a nested leave-cluster-out cross-validation. We compared LIBLINEAR against a Nai?ve Bayes classifier, a random decision forest classifier, and a maximum similarity ranking approach. These reference approaches were outperformed in a direct comparison by LIBLINEAR. A comparison to literature results showed that the LIBLINEAR performance is competitive but without achieving results as good as the top-ranked nonlinear machines on these benchmarks. However, considering the overall convincing performance and computation time of the large-scale support vector machine, the approach provides an excellent alternative to established large-scale classification approaches.

Resumo Limpo

goal studi adapt recent propos linear largescal support vector machin largescal binari cheminformat classif problem assess perform various benchmark use virtual screen perform measur extend largescal linear support vector machin librari liblinear stateoftheart virtual highthroughput screen metric train classifi whole larg unbalanc data set formul linear support machin excel perform appli highdimension spars featur vector addit advantag averag linear complex number nonzero featur predict nevertheless approach assum problem linear separ therefor conduct extens benchmark evalu perform largescal problem size sampl examin virtual screen perform determin chemotyp cluster use featur tree integr inform comput weight aucbas perform measur leaveclusterout crossvalid also consid bedroc score metric suggest tackl earli enrich problem perform problem evalu nest crossvalid nest leaveclusterout crossvalid compar liblinear naiv bay classifi random decis forest classifi maximum similar rank approach refer approach outperform direct comparison liblinear comparison literatur result show liblinear perform competit without achiev result good toprank nonlinear machin benchmark howev consid overal convinc perform comput time largescal support vector machin approach provid excel altern establish largescal classif approach

Resumos Similares

Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,789286519680533 )
Comput Math Methods Med - Mixed-norm regularization for brain decoding. ( 0,760005509699072 )
Comput. Biol. Med. - Heartbeat classification using disease-specific feature selection. ( 0,755255151683765 )
J Am Med Inform Assoc - Learning regular expressions for clinical text classification. ( 0,750631638007543 )
Comput Math Methods Med - Comparison of different EHG feature selection methods for the detection of preterm labor. ( 0,748814847098675 )
J Med Syst - Automated screening of arrhythmia using wavelet based machine learning techniques. ( 0,743050794588647 )
J Med Syst - An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. ( 0,742091551212493 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,741810233736924 )
J Med Syst - SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. ( 0,738350466659247 )
Comput Math Methods Med - SVM versus MAP on accelerometer data to distinguish among locomotor activities executed at different speeds. ( 0,734202692375658 )
Artif Intell Med - Improving the accuracy of suicide attempter classification. ( 0,727497742900768 )
Comput Biol Chem - Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. ( 0,727130187067172 )
Comput. Biol. Med. - A novel class dependent feature selection method for cancer biomarker discovery. ( 0,723342705385518 )
Artif Intell Med - Selective voting in convex-hull ensembles improves classification accuracy. ( 0,721146814156599 )
Int J Neural Syst - Extraction of neural control commands using myoelectric pattern recognition: a novel application in adults with cerebral palsy. ( 0,717622850670204 )
Comput Methods Programs Biomed - A new hybrid intelligent system for accurate detection of Parkinson's disease. ( 0,715944320772749 )
IEEE Trans Image Process - Human detection in images via piecewise linear support vector machines. ( 0,714302135376019 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,705119103530291 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,704417584036015 )
Brief. Bioinformatics - Class-imbalanced classifiers for high-dimensional data. ( 0,697483181497908 )
J Biomed Inform - A fast gene selection method for multi-cancer classification using multiple support vector data description. ( 0,696420649951156 )
J Med Syst - Detection and localization of myocardial infarction using K-nearest neighbor classifier. ( 0,696149716134167 )
J Med Syst - A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases. ( 0,694834343741154 )
Comput. Biol. Med. - An ensemble system for automatic sleep stage classification using single channel EEG signal. ( 0,694288425269363 )
Comput Methods Programs Biomed - An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. ( 0,692748765756861 )
Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,688083452686715 )
J Biomed Inform - Automatic figure classification in bioscience literature. ( 0,687071326432925 )
Int J Comput Assist Radiol Surg - Building an ensemble system for diagnosing masses in mammograms. ( 0,686347786354724 )
J Med Syst - A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. ( 0,686338686686539 )
Comput Math Methods Med - An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images. ( 0,685611502249536 )
Comput Methods Programs Biomed - Understanding symptomatology of atherosclerotic plaque by image-based tissue characterization. ( 0,684345435690828 )
J Med Syst - Classification of speech dysfluencies using LPC based parameterization techniques. ( 0,683757390836989 )
Comput Methods Programs Biomed - Automatic cervical cell segmentation and classification in Pap smears. ( 0,683484321215099 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,683275578953463 )
Comput Methods Programs Biomed - Complex extreme learning machine applications in terahertz pulsed signals feature sets. ( 0,681947125143357 )
Comput. Biol. Med. - A classification study of kinematic gait trajectories in hip osteoarthritis. ( 0,681788979444469 )
J Am Med Inform Assoc - Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. ( 0,680579922323568 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,679875219929144 )
IEEE Trans Image Process - Efficient HIK SVM learning for image classification. ( 0,676855943209858 )
Artif Intell Med - Selection of effective features for ECG beat recognition based on nonlinear correlations. ( 0,672993958752121 )
Comput. Biol. Med. - Contourlet-based mammography mass classification using the SVM family. ( 0,672295227835891 )
IEEE J Biomed Health Inform - Recognizing common CT imaging signs of lung diseases through a new feature selection method based on Fisher criterion and genetic optimization. ( 0,671422903383897 )
Comput. Biol. Med. - Classification of Error-Related Negativity (ERN) and Positivity (Pe) potentials using kNN and Support Vector Machines. ( 0,670643699125361 )
J Med Syst - Enhanced cancer recognition system based on random forests feature elimination algorithm. ( 0,670267125321854 )
Neural Comput - An Infomax algorithm can perform both familiarity discrimination and feature extraction in a single network. ( 0,669722928338542 )
Med Biol Eng Comput - Decision support system for age-related macular degeneration using discrete wavelet transform. ( 0,669567365812479 )
Artif Intell Med - Subpopulation-specific confidence designation for more informative biomedical classification. ( 0,668471453506093 )
Comput. Biol. Med. - A new dataset evaluation method based on category overlap. ( 0,667301087304772 )
Comput. Biol. Med. - Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. ( 0,667075872863628 )
J Med Syst - Sparse representation-based heartbeat classification using independent component analysis. ( 0,666217897109393 )
J Med Syst - Down syndrome diagnosis based on Gabor Wavelet Transform. ( 0,666072876080777 )
Comput Biol Chem - Derivation of an artificial gene to improve classification accuracy upon gene selection. ( 0,664690573327765 )
IEEE Trans Neural Netw Learn Syst - FREL: A Stable Feature Selection Algorithm. ( 0,663654961404559 )
Artif Intell Med - Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology. ( 0,662977784990484 )
Comput Methods Programs Biomed - A random forest classifier for lymph diseases. ( 0,661443833445471 )
Comput. Biol. Med. - Pairwise FCM based feature weighting for improved classification of vertebral column disorders. ( 0,660831028149075 )
Comput. Biol. Med. - Investigating the performance improvement of HRV Indices in CHF using feature selection methods based on backward elimination and statistical significance. ( 0,660324106150484 )
Artif Intell Med - An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms. ( 0,659420292661097 )
Artif Intell Med - Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. ( 0,659244233520701 )
IEEE Trans Image Process - A novel technique for subpixel image classification based on support vector machine. ( 0,658932506005727 )
Neural Comput - The support feature machine: classification with the least number of features and application to neuroimaging data. ( 0,658424635656544 )
Comput. Biol. Med. - Region based stellate features combined with variable selection using AdaBoost learning in mammographic computer-aided detection. ( 0,65811093105914 )
J Digit Imaging - Computer-aided diagnosis of malignant mammograms using Zernike moments and SVM. ( 0,658011959567331 )
Comput Methods Programs Biomed - Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. ( 0,657122893472694 )
Int J Neural Syst - Combination of heterogeneous EEG feature extraction methods and stacked sequential learning for sleep stage classification. ( 0,654837764507673 )
J Am Med Inform Assoc - A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. ( 0,654779852655566 )
Comput Methods Programs Biomed - Computer-supported diagnosis for endotension cases in endovascular aortic aneurysm repair evolution. ( 0,654186561357952 )
Med Biol Eng Comput - Wavelet-based sparse functional linear model with applications to EEGs seizure detection and epilepsy diagnosis. ( 0,653862765013382 )
Int J Neural Syst - Single-trial motor imagery classification using asymmetry ratio, phase relation, wavelet-based fractal, and their selected combination. ( 0,65317891654718 )
Artif Intell Med - Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. ( 0,652933733401409 )
Comput. Biol. Med. - Ant colony optimization-based feature selection method for surface electromyography signals classification. ( 0,649264926191416 )
J Med Syst - A three-stage expert system based on support vector machines for thyroid disease diagnosis. ( 0,648409215343791 )
Comput. Biol. Med. - A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. ( 0,648018499173467 )
J Med Syst - Classification of normal and diseased liver shapes based on Spherical Harmonics coefficients. ( 0,647086843161756 )
Artif Intell Med - Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C. ( 0,646818836579508 )
Comput. Biol. Med. - Retinal vessel extraction using Lattice Neural Networks with Dendritic Processing. ( 0,645062826241746 )
Comput Methods Programs Biomed - ECG beat classification using a cost sensitive classifier. ( 0,644530249369411 )
J Med Syst - Detection of carotid artery disease by using Learning Vector Quantization Neural Network. ( 0,643954318937329 )
IEEE J Biomed Health Inform - Classification of bacterial contamination using image processing and distributed computing. ( 0,643837194713915 )
Comput Methods Programs Biomed - Evaluation of different distortion correction methods and interpolation techniques for an automated classification of celiac disease. ( 0,643275131638202 )
Int J Neural Syst - Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. ( 0,641686701864246 )
IEEE J Biomed Health Inform - Computer-aided diagnosis in hysteroscopic imaging. ( 0,641541240774553 )
J Biomed Inform - Quality assessment of data discrimination using self-organizing maps. ( 0,640970271388952 )
Med Biol Eng Comput - Classification of multichannel EEG patterns using parallel hidden Markov models. ( 0,640676882403499 )
Artif Intell Med - Automatic detection of epileptic seizures on the intra-cranial electroencephalogram of rats using reservoir computing. ( 0,639162648644318 )
J Med Syst - Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification. ( 0,638803871911224 )
Comput. Biol. Med. - Neural system for heartbeats recognition using genetically integrated ensemble of classifiers. ( 0,638507159903271 )
Comput Math Methods Med - Feature selection in classification of eye movements using electrooculography for activity recognition. ( 0,63792368880779 )
Comput Methods Programs Biomed - Performance comparison of machine learning methods for prognosis of hormone receptor status in breast cancer tissue samples. ( 0,63780807240079 )
Comput. Biol. Med. - A new feature extraction framework based on wavelets for breast cancer diagnosis. ( 0,636497247521303 )
Neural Comput - Adaptive classification on brain-computer interfaces using reinforcement signals. ( 0,636323348493126 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,635559296745823 )
J Med Syst - Comparison of statistical, LBP, and multi-resolution analysis features for breast mass classification. ( 0,63448358566722 )
Comput Methods Programs Biomed - Drug/nondrug classification using Support Vector Machines with various feature selection strategies. ( 0,63238034264602 )
J Med Syst - A new expert system for diagnosis of lung cancer: GDA-LS_SVM. ( 0,632008321152607 )
IEEE Trans Image Process - Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. ( 0,631521960486902 )
Comput. Biol. Med. - A hybrid feature selection method for DNA microarray data. ( 0,630910883368526 )
Int J Neural Syst - Automated diagnosis of epilepsy using CWT, HOS and texture parameters. ( 0,629539289859478 )
Comput Biol Chem - Multi objective SNP selection using pareto optimality. ( 0,628229952314186 )
Comput Methods Programs Biomed - Denoised P300 and machine learning-based concealed information test method. ( 0,628093433886305 )