J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ model(2656) set(1616) predict(1553) }
{ compound(1573) activ(1297) structur(1058) }
{ cost(1906) reduc(1198) effect(832) }
{ featur(1941) imag(1645) propos(1176) }
{ imag(2675) segment(2577) method(1081) }
{ can(981) present(881) function(850) }
{ method(1557) propos(1049) approach(1037) }
{ result(1111) use(1088) new(759) }
{ learn(2355) train(1041) set(1003) }
{ can(774) often(719) complex(702) }
{ framework(1458) process(801) describ(734) }
{ imag(2830) propos(1344) filter(1198) }
{ assess(1506) score(1403) qualiti(1306) }
{ error(1145) method(1030) estim(1020) }
{ high(1669) rate(1365) level(1280) }
{ use(1733) differ(960) four(931) }
{ decis(3086) make(1611) patient(1517) }
{ measur(2081) correl(1212) valu(896) }
{ patient(2315) diseas(1263) diabet(1191) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ search(2224) databas(1162) retriev(909) }
{ risk(3053) factor(974) diseas(938) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ activ(1138) subject(705) human(624) }
{ structur(1116) can(940) graph(676) }
{ implement(1333) system(1263) develop(1122) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ network(2748) neural(1063) input(814) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

There are currently thousands of molecular descriptors that can be calculated to represent a chemical compound. Utilizing all molecular descriptors in Quantitative Structure-Activity Relationships (QSAR) modeling can result in overfitting, decreased interpretability, and thus reduced model performance. Feature selection methods can overcome some of these problems by drastically reducing the number of molecular descriptors and selecting the molecular descriptors relevant to the property being predicted. In particular, decision trees such as C&RT, although they have an embedded feature selection algorithm, can be inadequate since further down the tree there are fewer compounds available for descriptor selection, and therefore descriptors may be selected which are not optimal. In this work we compare two broad approaches for feature selection: (1) a "two-stage" feature selection procedure, where a pre-processing feature selection method selects a subset of descriptors, and then classification and regression trees (C&RT) selects descriptors from this subset to build a decision tree; (2) a "one-stage" approach where C&RT is used as the only feature selection technique. These methods were applied in order to improve prediction accuracy of QSAR models for oral absorption. Additionally, this work utilizes misclassification costs in model building to overcome the problem of the biased oral absorption data sets with more highly absorbed than poorly absorbed compounds. In most cases the two-stage feature selection with pre-processing approach had higher model accuracy compared with the one-stage approach. Using the top 20 molecular descriptors from the random forest predictor importance method gave the most accurate C&RT classification model. The molecular descriptors selected by the five filter feature selection methods have been compared in relation to oral absorption. In conclusion, the use of filter pre-processing feature selection methods and misclassification costs produce models with better interpretability and predictability for the prediction of oral absorption.

Resumo Limpo

current thousand molecular descriptor can calcul repres chemic compound util molecular descriptor quantit structureact relationship qsar model can result overfit decreas interpret thus reduc model perform featur select method can overcom problem drastic reduc number molecular descriptor select molecular descriptor relev properti predict particular decis tree crt although embed featur select algorithm can inadequ sinc tree fewer compound avail descriptor select therefor descriptor may select optim work compar two broad approach featur select twostag featur select procedur preprocess featur select method select subset descriptor classif regress tree crt select descriptor subset build decis tree onestag approach crt use featur select techniqu method appli order improv predict accuraci qsar model oral absorpt addit work util misclassif cost model build overcom problem bias oral absorpt data set high absorb poor absorb compound case twostag featur select preprocess approach higher model accuraci compar onestag approach use top molecular descriptor random forest predictor import method gave accur crt classif model molecular descriptor select five filter featur select method compar relat oral absorpt conclus use filter preprocess featur select method misclassif cost produc model better interpret predict predict oral absorpt

Resumos Similares

J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,779039380841788 )
Comput. Biol. Med. - Extracting predictive SNPs in Crohn's disease using a vacillating genetic algorithm and a neural classifier in case-control association studies. ( 0,766313782509972 )
IEEE J Biomed Health Inform - Recognizing common CT imaging signs of lung diseases through a new feature selection method based on Fisher criterion and genetic optimization. ( 0,760323979417368 )
Comput Methods Programs Biomed - A random forest classifier for lymph diseases. ( 0,758987826846171 )
Int J Neural Syst - Assessment of feature selection and classification approaches to enhance information from overnight oximetry in the context of apnea diagnosis. ( 0,758868899216672 )
Comput Math Methods Med - Principal feature analysis: a multivariate feature selection method for fMRI data. ( 0,749278294842578 )
J Chem Inf Model - Choosing feature selection and learning algorithms in QSAR. ( 0,742026704521962 )
Comput. Biol. Med. - An ensemble system for automatic sleep stage classification using single channel EEG signal. ( 0,738457175242596 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,737040086855104 )
Int J Neural Syst - On the segmentation and classification of hand radiographs. ( 0,736593710116972 )
Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,736233517604901 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,73549093615116 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,735093964312707 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,734481383960125 )
J Med Syst - SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. ( 0,732300503832379 )
Comput Math Methods Med - An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images. ( 0,731795682861645 )
Comput. Biol. Med. - Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines. ( 0,730673750286836 )
Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,730398401127046 )
J Biomed Inform - Automatic figure classification in bioscience literature. ( 0,730034282687238 )
Comput Methods Programs Biomed - Drug/nondrug classification using Support Vector Machines with various feature selection strategies. ( 0,727368161392336 )
Comput Methods Programs Biomed - Automatic cervical cell segmentation and classification in Pap smears. ( 0,726583016197798 )
Int J Neural Syst - Single-trial motor imagery classification using asymmetry ratio, phase relation, wavelet-based fractal, and their selected combination. ( 0,724920826078016 )
J Med Syst - Classification of speech dysfluencies using LPC based parameterization techniques. ( 0,7248335892357 )
Int J Comput Assist Radiol Surg - Building an ensemble system for diagnosing masses in mammograms. ( 0,723097101741033 )
J Med Syst - A three-stage expert system based on support vector machines for thyroid disease diagnosis. ( 0,718762165259498 )
IEEE Trans Image Process - Maximum Margin Correlation Filter: a new approach for localization and classification. ( 0,717457594375047 )
IEEE Trans Image Process - A novel technique for subpixel image classification based on support vector machine. ( 0,716253463598091 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,714371948175531 )
Comput. Biol. Med. - Contourlet-based mammography mass classification using the SVM family. ( 0,714050153982051 )
Comput. Biol. Med. - Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. ( 0,713833027907471 )
J Chem Inf Model - Classifying molecules using a sparse probabilistic kernel binary classifier. ( 0,713293010178343 )
Artif Intell Med - Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. ( 0,713268858788166 )
Comput Math Methods Med - Comparison of different EHG feature selection methods for the detection of preterm labor. ( 0,713157225416076 )
Comput. Biol. Med. - A threshold fuzzy entropy based feature selection for medical database classification. ( 0,713015361719099 )
IEEE J Biomed Health Inform - Computer-aided diagnosis in hysteroscopic imaging. ( 0,711871234006274 )
J Med Syst - A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases. ( 0,710442228195674 )
J Med Syst - An integrated index for the identification of diabetic retinopathy stages using texture parameters. ( 0,709144649078072 )
Comput. Biol. Med. - Pairwise FCM based feature weighting for improved classification of vertebral column disorders. ( 0,70835197457099 )
IEEE Trans Image Process - Maximum margin projection subspace learning for visual data analysis. ( 0,706929580375673 )
Comput. Biol. Med. - A new feature extraction framework based on wavelets for breast cancer diagnosis. ( 0,705717913829809 )
Int J Comput Assist Radiol Surg - Multimodality GPU-based computer-assisted diagnosis of breast cancer using ultrasound and digital mammography images. ( 0,704282371217762 )
Comput. Biol. Med. - Heartbeat classification using disease-specific feature selection. ( 0,703917114634368 )
Comput Methods Programs Biomed - Operator functional state classification using least-square support vector machine based recursive feature elimination technique. ( 0,700631674260551 )
Artif Intell Med - An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms. ( 0,700167749313353 )
Comput Biol Chem - Derivation of an artificial gene to improve classification accuracy upon gene selection. ( 0,700002466925093 )
BMC Med Inform Decis Mak - Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. ( 0,699430095495914 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,697270012278447 )
J Med Syst - A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. ( 0,697022303693727 )
Comput. Biol. Med. - A novel class dependent feature selection method for cancer biomarker discovery. ( 0,694417279060707 )
J Med Syst - A new expert system for diagnosis of lung cancer: GDA-LS_SVM. ( 0,693720031536854 )
J Med Syst - Automated diagnosis of Alzheimer disease using the scale-invariant feature transforms in magnetic resonance images. ( 0,693462859980469 )
Comput Math Methods Med - SVM versus MAP on accelerometer data to distinguish among locomotor activities executed at different speeds. ( 0,693012826683058 )
J Am Med Inform Assoc - Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. ( 0,691917995204511 )
Comput. Biol. Med. - A new dataset evaluation method based on category overlap. ( 0,689917084869836 )
Artif Intell Med - A supervised method to assist the diagnosis and monitor progression of Alzheimer's disease using data from an fMRI experiment. ( 0,68988344344106 )
Comput Methods Programs Biomed - A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals. ( 0,689842781252087 )
J Med Syst - An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. ( 0,688485448798252 )
J Biomed Inform - A fast gene selection method for multi-cancer classification using multiple support vector data description. ( 0,686775334353208 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,686503512903482 )
Comput. Biol. Med. - Gene expression microarray classification using PCA-BEL. ( 0,684911299991283 )
Comput Methods Programs Biomed - ECG beat classification using a cost sensitive classifier. ( 0,684837679919555 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,683770444677676 )
J Chem Inf Model - GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design. ( 0,683179788097548 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,683012340545703 )
Neural Comput - High-dimensional cluster analysis with the masked EM algorithm. ( 0,682702123541285 )
Comput. Biol. Med. - Bispectral analysis and genetic algorithm for congestive heart failure recognition based on heart rate variability. ( 0,68190751357612 )
J Am Med Inform Assoc - A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. ( 0,681621566534186 )
J Biomed Inform - A biological continuum based approach for efficient clinical classification. ( 0,67890570061181 )
J Med Syst - Symptomatic vs. asymptomatic plaque classification in carotid ultrasound. ( 0,678057731816302 )
J Med Syst - Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification. ( 0,677343186987257 )
Int J Comput Assist Radiol Surg - Brain tumor classification on intraoperative contrast-enhanced ultrasound. ( 0,67731148201824 )
Int J Neural Syst - Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. ( 0,677268326404657 )
Comput Methods Programs Biomed - A new hybrid intelligent system for accurate detection of Parkinson's disease. ( 0,676818036890647 )
J Med Syst - Statistical analysis of textural features for improved classification of oral histopathological images. ( 0,676570091153864 )
J Med Syst - Enhanced cancer recognition system based on random forests feature elimination algorithm. ( 0,675006720708539 )
J Med Syst - Luminance sticker based facial expression recognition using discrete wavelet transform for physically disabled persons. ( 0,672946215837497 )
Med Biol Eng Comput - Wavelet-based sparse functional linear model with applications to EEGs seizure detection and epilepsy diagnosis. ( 0,672423831301859 )
IEEE Trans Image Process - Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. ( 0,669028848031857 )
J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,668752581593106 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,667957107959986 )
AMIA Annu Symp Proc - Automatic Prediction of Conversion from Mild Cognitive Impairment to Probable Alzheimer's Disease using Structural Magnetic Resonance Imaging. ( 0,666259848530571 )
IEEE Trans Image Process - Efficient HIK SVM learning for image classification. ( 0,664325571578679 )
Comput. Biol. Med. - Ensemble selection for feature-based classification of diabetic maculopathy images. ( 0,662513190701228 )
Comput Methods Programs Biomed - Computer-supported diagnosis for endotension cases in endovascular aortic aneurysm repair evolution. ( 0,662510720324928 )
IEEE Trans Image Process - Human detection in images via piecewise linear support vector machines. ( 0,660702527132727 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,659522583106269 )
Comput. Biol. Med. - Neurocognitive disorder detection based on feature vectors extracted from VBM analysis of structural MRI. ( 0,657911347628165 )
J Chem Inf Model - A binary ant colony optimization classifier for molecular activities. ( 0,657323663117201 )
Comput. Biol. Med. - Using machine learning techniques and genomic/proteomic information from known databases for defining relevant features for PPI classification. ( 0,657243436922212 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,656316383086317 )
Int J Neural Syst - Combination of heterogeneous EEG feature extraction methods and stacked sequential learning for sleep stage classification. ( 0,655337329203417 )
Comput Methods Programs Biomed - Performance comparison of machine learning methods for prognosis of hormone receptor status in breast cancer tissue samples. ( 0,65338002005 )
Comput Biol Chem - Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. ( 0,652175524380448 )
Comput Math Methods Med - Comparison of the data classification approaches to diagnose spinal cord injury. ( 0,651462559307972 )
Comput Math Methods Med - An intelligent system approach for asthma prediction in symptomatic preschool children. ( 0,650135932888313 )
Artif Intell Med - Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology. ( 0,64922671175196 )
J Biomed Inform - An efficient statistical feature selection approach for classification of gene expression data. ( 0,649122654513974 )
Artif Intell Med - Improving the accuracy of suicide attempter classification. ( 0,647774790055175 )
J Med Syst - Computer aided diagnosis system for breast cancer based on color Doppler flow imaging. ( 0,647591146903146 )
Int J Comput Assist Radiol Surg - Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection. ( 0,647395678733471 )