Comput Biol Chem - Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions.

Tópicos

{ model(2341) predict(2261) use(1141) }
{ featur(3375) classif(2383) classifi(1994) }
{ error(1145) method(1030) estim(1020) }
{ bind(1733) structur(1185) ligand(1036) }
{ can(774) often(719) complex(702) }
{ problem(2511) optim(1539) algorithm(950) }
{ search(2224) databas(1162) retriev(909) }
{ data(2317) use(1299) case(1017) }
{ method(1219) similar(1157) match(930) }
{ patient(2315) diseas(1263) diabet(1191) }
{ general(901) number(790) one(736) }
{ system(1976) rule(880) can(841) }
{ sequenc(1873) structur(1644) protein(1328) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ learn(2355) train(1041) set(1003) }
{ design(1359) user(1324) use(1319) }
{ method(984) reconstruct(947) comput(926) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2656) set(1616) predict(1553) }
{ can(981) present(881) function(850) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html.

Resumo Limpo

protein number interact pair usual much smaller number noninteract one imbalanc data problem will aris field proteinprotein interact ppis predict articl introduc two ensembl method solv imbalanc data problem ensembl method combin basedclust undersampl techniqu fusion classifi evalu ensembl method use dataset databas interact protein dip fold cross valid predict model achiev area receiv oper characterist curv auc valu result show ensembl classifi quit effect predict ppis also gain valuabl conclus perform ensembl method ppis imbalanc data predict softwar dataset employ work can obtain free httpcicscueducnbioinformaticsensembleppisindexhtml

Resumos Similares

BMC Med Inform Decis Mak - A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. ( 0,747642174366939 )
J Biomed Inform - An empirical approach to model selection through validation for censored survival data. ( 0,747501896232399 )
Methods Inf Med - An experimental evaluation of boosting methods for classification. ( 0,746032717359818 )
J Chem Inf Model - Homology modeling of human muscarinic acetylcholine receptors. ( 0,719061590887424 )
J. Comput. Biol. - Prediction of siRNA potency using sparse logistic regression. ( 0,71432796585795 )
Comput Math Methods Med - Iterative reweighted noninteger norm regularizing SVM for gene expression data classification. ( 0,71280680684903 )
J Am Med Inform Assoc - A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. ( 0,711285708930557 )
Comput Biol Chem - An ensemble method for prediction of conformational B-cell epitopes from antigen sequences. ( 0,71113217700542 )
J Med Syst - A new approach: role of data mining in prediction of survival of burn patients. ( 0,702497251589202 )
Comput Methods Programs Biomed - Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. ( 0,699554602828593 )
J Chem Inf Model - Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. ( 0,697486876615658 )
J Med Syst - Diagnosing breast masses in digital mammography using feature selection and ensemble methods. ( 0,69633152890877 )
J Biomed Inform - Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data. ( 0,691876165197425 )
Neural Comput - An extension of the receiver operating characteristic curve and AUC-optimal classification. ( 0,679597987318545 )
Med Decis Making - A comparison of methods for converting DCE values onto the full health-dead QALY scale. ( 0,677985743687388 )
Comput Methods Programs Biomed - ThyroScreen system: high resolution ultrasound thyroid image characterization into benign and malignant classes using novel combination of texture and discrete wavelet transform. ( 0,677082771711602 )
Comput. Biol. Med. - Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. ( 0,672316629141861 )
J Am Med Inform Assoc - An improved model for predicting postoperative nausea and vomiting in ambulatory surgery patients using physician-modifiable risk factors. ( 0,668486254687242 )
AMIA Annu Symp Proc - Clinical risk prediction by exploring high-order feature correlations. ( 0,664129554691752 )
Comput Math Methods Med - Modified logistic regression models using gene coexpression and clinical features to predict prostate cancer progression. ( 0,661534263837356 )
Comput Methods Programs Biomed - Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. ( 0,660736456698189 )
Comput Math Methods Med - Prediction of BP reactivity to talking using hybrid soft computing approaches. ( 0,659723989746425 )
Int J Med Inform - Application of data mining to the identification of critical factors in patient falls using a web-based reporting system. ( 0,659310075190975 )
IEEE J Biomed Health Inform - Novel fractal feature-based multiclass glaucoma detection and progression prediction. ( 0,658473246654126 )
J Am Med Inform Assoc - Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. ( 0,657703535228302 )
BMC Med Inform Decis Mak - Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population. ( 0,655366712568648 )
J Biomed Inform - Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. ( 0,651036748511165 )
J Chem Inf Model - Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. ( 0,650062074863295 )
Int J Health Geogr - Prediction of high-risk areas for visceral leishmaniasis using socioeconomic indicators and remote sensing data. ( 0,650046890831185 )
Appl Clin Inform - Comparing predictions made by a prediction model, clinical score, and physicians: pediatric asthma exacerbations in the emergency department. ( 0,649461987033679 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,648325906197767 )
Med Decis Making - Application of an artificial neural network to predict postinduction hypotension during general anesthesia. ( 0,646058683271463 )
J Clin Monit Comput - Effect of concurrent oxygen therapy on accuracy of forecasting imminent postoperative desaturation. ( 0,64516677116922 )
J Med Syst - Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. ( 0,644473880744577 )
Lifetime Data Anal - Understanding increments in model performance metrics. ( 0,64248893310271 )
BMC Med Inform Decis Mak - Non-linear dynamical signal characterization for prediction of defibrillation success through machine learning. ( 0,63816093854862 )
IEEE Trans Image Process - Efficient image classification via multiple rank regression. ( 0,637399821683117 )
Comput. Biol. Med. - Statistical model based 3D shape prediction of postoperative trunks for non-invasive scoliosis surgery planning. ( 0,637307898634963 )
Comput Math Methods Med - Variable selection in ROC regression. ( 0,63569686162709 )
Spat Spatiotemporal Epidemiol - Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. ( 0,63444629952484 )
J Med Syst - Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models. ( 0,630736073718361 )
Comput. Biol. Med. - A ternary model of decompression sickness in rats. ( 0,629632017643028 )
BMC Med Inform Decis Mak - Evaluation of prediction models for the staging of prostate cancer. ( 0,627773863202028 )
J Clin Monit Comput - Use of genetic programming, logistic regression, and artificial neural nets to predict readmission after coronary artery bypass surgery. ( 0,625281472749407 )
J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. ( 0,624865751235666 )
Comput. Biol. Med. - A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks. ( 0,624222538096128 )
IEEE Trans Image Process - Network-based H.264/AVC whole frame loss visibility model and frame dropping methods. ( 0,624098612402525 )
IEEE J Biomed Health Inform - The effect of sample age and prediction resolution on myocardial infarction risk prediction. ( 0,622006121693935 )
BMC Med Inform Decis Mak - Use of outcomes to evaluate surveillance systems for bioterrorist attacks. ( 0,621892039951161 )
J Am Med Inform Assoc - From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. ( 0,621705968617449 )
BMC Med Inform Decis Mak - Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. ( 0,620672768931891 )
BMC Med Inform Decis Mak - Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. ( 0,619689169114217 )
Comput Methods Programs Biomed - Prediction of postprandial blood glucose under uncertainty and intra-patient variability in type 1 diabetes: a comparative study of three interval models. ( 0,618265554100093 )
Artif Intell Med - White box radial basis function classifiers with component selection for clinical prediction models. ( 0,61745270208555 )
Brief. Bioinformatics - Critical assessment of high-throughput standalone methods for secondary structure prediction. ( 0,616808566193535 )
Comput Methods Programs Biomed - Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. ( 0,615521460975819 )
Comput Methods Programs Biomed - Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. ( 0,615513113135296 )
AMIA Annu Symp Proc - Predicting Surgical Risk: How Much Data is Enough? ( 0,612120511786839 )
AMIA Annu Symp Proc - Application of Bayesian logistic regression to mining biomedical data. ( 0,610077715012081 )
J Chem Inf Model - Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models. ( 0,609145437966015 )
Methods Inf Med - Classification of postural profiles among mouth-breathing children by learning vector quantization. ( 0,608636207502673 )
Med Biol Eng Comput - Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques. ( 0,605962346596754 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,605441071826949 )
J Biomed Inform - Statistical process control for validating a classification tree model for predicting mortality--a novel approach towards temporal validation. ( 0,603909649210159 )
AMIA Annu Symp Proc - Decision path models for patient-specific modeling of patient outcomes. ( 0,602897151515725 )
IEEE J Biomed Health Inform - Classification of color images of dermatological ulcers. ( 0,600304186245036 )
BMC Med Inform Decis Mak - Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach. ( 0,598781848518611 )
Brief. Bioinformatics - Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). ( 0,597864518697486 )
J Med Syst - Comparison of artificial neural networks with logistic regression for detection of obesity. ( 0,596294848684016 )
J Chem Inf Model - dREL: a relational expression language for dictionary methods. ( 0,594949231569154 )
Comput. Biol. Med. - Breast-cancer identification using HMM-fuzzy approach. ( 0,594503229626289 )
Med Decis Making - Performance of a mathematical model to forecast lives saved from HIV treatment expansion in resource-limited settings. ( 0,593871071184152 )
Spat Spatiotemporal Epidemiol - Modeling habitat suitability for occurrence of highly pathogenic avian influenza virus H5N1 in domestic poultry in Asia: a spatial multicriteria decision analysis approach. ( 0,592986613998488 )
IEEE J Biomed Health Inform - Computer-aided staging of lymphoma patients with FDG PET/CT imaging based on textural information. ( 0,592538335507714 )
J Am Med Inform Assoc - Predicting complications of percutaneous coronary intervention using a novel support vector method. ( 0,591648287035264 )
Artif Intell Med - Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. ( 0,591622826078762 )
Med Decis Making - Contrasting two frameworks for ROC analysis of ordinal ratings. ( 0,591487520851768 )
Med Decis Making - Adaptation of clinical prediction models for application in local settings. ( 0,588284968512921 )
BMC Med Inform Decis Mak - Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model. ( 0,587571529908965 )
Methods Inf Med - Limited sampling strategies to estimate the area under the concentration-time curve. Biases and a proposed more accurate method. ( 0,58640885349184 )
BMC Med Inform Decis Mak - Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model. ( 0,584102383259544 )
IEEE Trans Neural Netw Learn Syst - Retargeted Least Squares Regression Algorithm. ( 0,582890024270139 )
J Biomed Inform - Data mining methods for classification of Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) using non-derivatized tandem MS neonatal screening data. ( 0,582739395668005 )
Artif Intell Med - Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. ( 0,58226813219554 )
Lifetime Data Anal - Estimating improvement in prediction with matched case-control designs. ( 0,581670998595877 )
BMC Med Inform Decis Mak - A method for managing re-identification risk from small geographic areas in Canada. ( 0,57932855567459 )
Comput Math Methods Med - SNP selection in genome-wide association studies via penalized support vector machine with MAX test. ( 0,578603072151377 )
BMC Med Inform Decis Mak - Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. ( 0,578152663092832 )
Comput Methods Programs Biomed - Exploring an optimal vector autoregressive model for multi-channel pulmonary sound data. ( 0,57776112291001 )
BMC Med Inform Decis Mak - Predicting disease risks from highly imbalanced data using random forest. ( 0,576941768265084 )
Comput Methods Programs Biomed - Computer-aided diagnosis of breast masses using quantified BI-RADS findings. ( 0,576363629442887 )
Comput Methods Programs Biomed - Development of a daily mortality probability prediction model from Intensive Care Unit patients using a discrete-time event history analysis. ( 0,573988822873477 )
J Med Syst - Applying cybernetic technology to diagnose human pulmonary sounds. ( 0,573154042203667 )
Artif Intell Med - PMirP: a pre-microRNA prediction method based on structure-sequence hybrid features. ( 0,572799764888042 )
Artif Intell Med - Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers. ( 0,572787481741077 )
Comput Biol Chem - Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes. ( 0,57176602257743 )
Artif Intell Med - Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples. ( 0,569797259317663 )
J Biomed Inform - Prediction of influenza vaccination outcome by neural networks and logistic regression. ( 0,566999609904402 )
J Chem Inf Model - Develop and test a solvent accessible surface area-based model in conformational entropy calculations. ( 0,566222099414284 )
Comput Biol Chem - Improved homology model of cyclohexanone monooxygenase from Acinetobacter calcoaceticus based on multiple templates. ( 0,565860196763887 )