Comput. Biol. Med. - Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ signal(2180) analysi(812) frequenc(800) }
{ result(1111) use(1088) new(759) }
{ method(2212) result(1239) propos(1039) }
{ method(1219) similar(1157) match(930) }
{ activ(1138) subject(705) human(624) }
{ use(976) code(926) identifi(902) }
{ learn(2355) train(1041) set(1003) }
{ model(2220) cell(1177) simul(1124) }
{ patient(1821) servic(1111) care(1106) }
{ detect(2391) sensit(1101) algorithm(908) }
{ measur(2081) correl(1212) valu(896) }
{ algorithm(1844) comput(1787) effici(935) }
{ featur(1941) imag(1645) propos(1176) }
{ first(2504) two(1366) second(1323) }
{ high(1669) rate(1365) level(1280) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ network(2748) neural(1063) input(814) }
{ error(1145) method(1030) estim(1020) }
{ model(2341) predict(2261) use(1141) }
{ state(1844) use(1261) util(961) }
{ group(2977) signific(1463) compar(1072) }
{ take(945) account(800) differ(722) }
{ care(1570) inform(1187) nurs(1089) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ model(3480) simul(1196) paramet(876) }
{ model(2656) set(1616) predict(1553) }
{ sampl(1606) size(1419) use(1276) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ use(1733) differ(960) four(931) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }

Resumo

Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets.

Resumo Limpo

remot protein homolog detect fold recognit refer detect structur homolog protein small similar sequenc detect protein structur class protein primari sequenc inform homologybas method develop can divid three type discrimin classifi generat model protein famili pairwis sequenc comparison support vector machin svm neural network nn two popular discrimin method recent studi shown svm fast speed train accur effici compar nn present comprehens method base twolay classifi st layer use detect superfamili famili scop hierarchi use optim binari svm classif rule use kernel function known biokernel incorpor biolog inform classif process nd layer use discrimin svm algorithm string kernel will detect protein fold level scop hierarchi result obtain evalu use mean roc mean mrfp signific result produc pairwis ttest test experiment result show approach signific improv perform remot protein homolog detect fold recognit three differ version scop dataset achiev improv term mean roc scop scop scop dataset compar result produc wellknown method combin first layer second layer biosvml perform well remot homolog detect fold recognit even three differ version dataset

Resumos Similares

Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,813444500340614 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,78734240538548 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,7867260781786 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,785450637367347 )
Comput Biol Chem - Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. ( 0,772200957168119 )
Comput Biol Chem - Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. ( 0,751202510894623 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,738979796682483 )
Comput Math Methods Med - Na?ve Bayes classifier with feature selection to identify phage virion proteins. ( 0,731987335666151 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,725157571594899 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,720118689716174 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,71250638224939 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,712434345199458 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers I: proteins. ( 0,711386496671253 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,707686685884573 )
Comput Math Methods Med - Identification of DNA-binding proteins using support vector machine with sequence information. ( 0,698040205380816 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,697697884540111 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,696084040943391 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,690894019074056 )
Comput Biol Chem - newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. ( 0,690502873854523 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,689151385205363 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,685842159112865 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,68464037347195 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,68102430216442 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,674028117879307 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,672792384568151 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,672157230791374 )
Comput. Biol. Med. - Remote homology detection incorporating the context of physicochemical properties. ( 0,671082992452923 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,666111280420771 )
Comput Math Methods Med - Knee joint vibration signal analysis with matching pursuit decomposition and dynamic weighted classifier fusion. ( 0,662703704883263 )
Comput Methods Programs Biomed - Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. ( 0,660089942038081 )
Comput. Biol. Med. - FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. ( 0,655593362152507 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,654667871309861 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,652434374858084 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,650907276789337 )
Comput. Biol. Med. - Prediction of methylation CpGs and their methylation degrees in human DNA sequences. ( 0,649164073769727 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,648684816100643 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,6471942627437 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,641701040468385 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,641402061538981 )
Med Biol Eng Comput - Classification of multichannel EEG patterns using parallel hidden Markov models. ( 0,641189757308946 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,634194288600419 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,633881372707861 )
J Integr Bioinform - Predicting genes involved in human cancer using network contextual information. ( 0,633571210487592 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,633094166508334 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,631333892284173 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,63128813294569 )
Comput. Biol. Med. - Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines. ( 0,63085820815919 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,630373179374369 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,630096786250041 )
J. Comput. Biol. - The generating function approach for Peptide identification in spectral networks. ( 0,629993489712602 )
Comput. Biol. Med. - Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. ( 0,628751177572976 )
Comput. Biol. Med. - Gene comparison based on the repetition of single-nucleotide structure patterns. ( 0,627443838636815 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,626661949310226 )
Comput. Biol. Med. - Identification of human drug targets using machine-learning algorithms. ( 0,626510010053611 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,626347502427904 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,626045689911167 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,623416086622979 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,623397975101975 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,622762772019481 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,622139519836598 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,62051894005917 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,620078860107923 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,619828069248406 )
BMC Med Inform Decis Mak - Efficient techniques for genotype-phenotype correlational analysis. ( 0,617951737216829 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,617614304808218 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,617364701244483 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,616027711900979 )
Brief. Bioinformatics - DRISEE overestimates errors in metagenomic sequencing data. ( 0,615391414876523 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,61356337730462 )
Comput. Biol. Med. - HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. ( 0,612062715551514 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,611293910929249 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,61013195194496 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,609712148323342 )
Med Biol Eng Comput - Characterization and prediction of mRNA polyadenylation sites in human genes. ( 0,609471285338513 )
Artif Intell Med - Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples. ( 0,609358032484927 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,608317044682735 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,605970025454996 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,60589391453589 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,605464377293857 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,603318460986847 )
Comput. Biol. Med. - ProClusEnsem: predicting membrane protein types by fusing different modes of pseudo amino acid composition. ( 0,60232624775916 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,601084965711842 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,599706628229039 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,599098261106217 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,598714495603598 )
Comput Methods Programs Biomed - Clustering technique-based least square support vector machine for EEG signal classification. ( 0,597354590834039 )
J. Comput. Biol. - Statistical significance of normalized global alignment. ( 0,597194215386792 )
Comput Methods Programs Biomed - Can computational biology improve the phylogenetic analysis of insulin? ( 0,597079789775781 )
IEEE J Biomed Health Inform - Extracting and Selecting Distinctive EEG Features for Efficient Epileptic Seizure Prediction. ( 0,596240395027868 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,596095243393107 )
Int J Neural Syst - Automated diagnosis of epilepsy using CWT, HOS and texture parameters. ( 0,595910186899708 )
Med Biol Eng Comput - Wavelet-based sparse functional linear model with applications to EEGs seizure detection and epilepsy diagnosis. ( 0,595732766923667 )
Comput. Biol. Med. - A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. ( 0,595455704523624 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,595057670055641 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,595047200909057 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,594273633149969 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,592665825770616 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,592067340380822 )
J. Comput. Biol. - Tracing the most parsimonious indel history. ( 0,591647652170632 )
Comput. Biol. Med. - Keratin protein property based classification of mammals and non-mammals using machine learning techniques. ( 0,591470106251754 )