Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ model(2341) predict(2261) use(1141) }
{ use(1733) differ(960) four(931) }
{ process(1125) use(805) approach(778) }
{ method(2212) result(1239) propos(1039) }
{ method(1219) similar(1157) match(930) }
{ featur(1941) imag(1645) propos(1176) }
{ gene(2352) biolog(1181) express(1162) }
{ learn(2355) train(1041) set(1003) }
{ first(2504) two(1366) second(1323) }
{ result(1111) use(1088) new(759) }
{ measur(2081) correl(1212) valu(896) }
{ take(945) account(800) differ(722) }
{ ehr(2073) health(1662) electron(1139) }
{ signal(2180) analysi(812) frequenc(800) }
{ analysi(2126) use(1163) compon(1037) }
{ activ(1452) weight(1219) physic(1104) }
{ model(3404) distribut(989) bayesian(671) }
{ network(2748) neural(1063) input(814) }
{ error(1145) method(1030) estim(1020) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ model(2220) cell(1177) simul(1124) }
{ visual(1396) interact(850) tool(830) }
{ activ(1138) subject(705) human(624) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model.

Resumo Limpo

protein genom contain signal peptid sp sequenc nterminus target protein intracellular secretori pathway protein target correct cell sp cleav releas matur protein accur predict presenc short aminoacid sp chain crucial model topolog membran protein sinc sp sequenc can confus transmembran domain due similar composit hydrophob amino acid paper present cascad support vector machin svmneural network nn classif methodolog sp discrimin cleavag site identif propos method utilis dual phase classif approach use svm primari classifi discrimin sp sequenc nonsp methodolog employ nns predict suitabl cleavag site candid phase one svm classif utilis hydrophob propens primari featur vector extract use symmetr slide window aminoacid sequenc analysi discrimin sp nonsp phase two nn classif use asymmetr slide window sequenc analysi predict cleavag site identif propos svmnn method test use uniprot nonredund dataset eukaryot prokaryot protein sp nonsp ntermini comput simul result demonstr overal accuraci sp nonsp discrimin base matthew correl coeffici mcc test use svm sp cleavag site predict overal accuraci base crossvalid test use novel svmnn model

Resumos Similares

J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,83190573350616 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,829909639543875 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,824560307538876 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,818853830420975 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,817529336768283 )
Comput. Biol. Med. - Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. ( 0,813444500340614 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,80390245932968 )
Comput Biol Chem - Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. ( 0,798613758463891 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,796439703288904 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,789193834581337 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,787074051449627 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,786467453792218 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,786397218729539 )
Comput. Biol. Med. - Remote homology detection incorporating the context of physicochemical properties. ( 0,777411673079391 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,775437416244657 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,771960116096393 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,766284323824066 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,766104593368734 )
Comput Biol Chem - Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. ( 0,765698803527359 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,763490986572716 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,762146194337237 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,757043456746091 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,756821192671213 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,753971130759901 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,752873470725566 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,74966581832417 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,748417015616182 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,748242843345109 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,747070150552341 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,746763333700765 )
Comput Math Methods Med - Na?ve Bayes classifier with feature selection to identify phage virion proteins. ( 0,743537493925889 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,740206848985957 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,737715529924102 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,736796037119642 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,73670165938768 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,734940074666771 )
Comput Methods Programs Biomed - Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. ( 0,734572242679241 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,732253566870989 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,73223140997238 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,731526698219073 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,729849271800231 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,725391225597398 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,724417932624443 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,723559215616749 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,723523589497746 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,723130505978216 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers I: proteins. ( 0,722157760145709 )
Comput. Biol. Med. - Prediction of protein functions based on function-function correlation relations. ( 0,722100880116151 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,721958715825441 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,721387869141016 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,721159127449313 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,721061422677532 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,718186484430651 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,71574458686382 )
Comput Biol Chem - newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. ( 0,715125304855488 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,714914935449864 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,714516540887167 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,714316910569628 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,712097807756012 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,71207007657151 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,708063325404052 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,70682651689445 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,703482427330187 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,701156963202006 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,699692979521546 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,69964613901797 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,696850609715692 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,696272992530099 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,69541369652495 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,695083507587372 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,692784977756858 )
Comput. Biol. Med. - Improving protein complex classification accuracy using amino acid composition profile. ( 0,692673450179695 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,692570965630824 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,692241779289915 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,690323529447479 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,688234671655776 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,684242835011588 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,68271804741734 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,681834095329978 )
Comput Biol Chem - PPM-Dom: a novel method for domain position prediction. ( 0,681173575540129 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,678717308561287 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,678507969785048 )
Comput. Biol. Med. - MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. ( 0,677238304178322 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,67664368658347 )
Comput Biol Chem - Gene cloning, homology comparison and analysis of the main functional structure domains of beta estrogen receptor in Jining Gray goat. ( 0,675905236114334 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,675052740555614 )
Comput. Biol. Med. - Gene comparison based on the repetition of single-nucleotide structure patterns. ( 0,674910971823974 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,673757341680751 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,672222318275944 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,672136792357689 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,670465860415105 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,667499348145486 )
Comput Biol Chem - In silico characterization and evolutionary analyses of CCAAT binding proteins in the lycophyte plant Selaginella moellendorffii genome: a growing comparative genomics resource. ( 0,66354415920815 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,662788746974793 )
Comput Biol Chem - Large replication skew domains delimit GC-poor gene deserts in human. ( 0,662660582226614 )
Comput Math Methods Med - ADLD: a novel graphical representation of protein sequences and its application. ( 0,662211112539783 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,661975097033482 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,661448504642489 )
Comput. Biol. Med. - HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. ( 0,660183816635339 )
Comput Math Methods Med - Identification of DNA-binding proteins using support vector machine with sequence information. ( 0,655156588303981 )