J Integr Bioinform - A hierarchical approach to protein fold prediction.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ can(774) often(719) complex(702) }
{ general(901) number(790) one(736) }
{ imag(1057) registr(996) error(939) }
{ first(2504) two(1366) second(1323) }
{ framework(1458) process(801) describ(734) }
{ compound(1573) activ(1297) structur(1058) }
{ method(1219) similar(1157) match(930) }
{ howev(809) still(633) remain(590) }
{ state(1844) use(1261) util(961) }
{ signal(2180) analysi(812) frequenc(800) }
{ result(1111) use(1088) new(759) }
{ take(945) account(800) differ(722) }
{ learn(2355) train(1041) set(1003) }
{ patient(2837) hospit(1953) medic(668) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ high(1669) rate(1365) level(1280) }
{ model(3404) distribut(989) bayesian(671) }
{ system(1976) rule(880) can(841) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ model(2220) cell(1177) simul(1124) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ import(1318) role(1303) understand(862) }
{ perform(1367) use(1326) method(1137) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ model(2656) set(1616) predict(1553) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ time(1939) patient(1703) rate(768) }
{ structur(1116) can(940) graph(676) }
{ use(976) code(926) identifi(902) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ method(2212) result(1239) propos(1039) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ bind(1733) structur(1185) ligand(1036) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Fold recognition, assigning novel proteins to known structures, forms an important component of the overall protein structure discovery process. The available methods for protein fold recognition are limited by the low fold-coverage and/or low prediction accuracies. We describe here a new Support Vector Machine (SVM)-based method for protein fold prediction with high prediction accuracy and high fold-coverage. The new method of fold prediction with high fold-coverage was developed by training and testing on a large number of folds in order to make the method suitable for large scale fold predictions. However, presence of large number of folds in the training set made the classification task difficult as a consequence of increased complexity involved in binary classifications of SVMs. In order to overcome this complexity we adopted a hierarchical approach where fold-prediction is made in two steps. At the first step structural class of the query is predicted and at the second step fold is predicted within the predicted structural class. This decreased the complexity of the classification problem and also improved the overall fold prediction accuracy. To the best of our knowledge this is the first taxonomic fold recognition method to cover over 700 protein-folds and gives prediction accuracy of around 70% on a benchmark dataset. Since the new method gives rise to state of the art prediction performance and hence can be very useful for structural characterization of proteins discovered in various genomes.

Resumo Limpo

fold recognit assign novel protein known structur form import compon overal protein structur discoveri process avail method protein fold recognit limit low foldcoverag andor low predict accuraci describ new support vector machin svmbase method protein fold predict high predict accuraci high foldcoverag new method fold predict high foldcoverag develop train test larg number fold order make method suitabl larg scale fold predict howev presenc larg number fold train set made classif task difficult consequ increas complex involv binari classif svms order overcom complex adopt hierarch approach foldpredict made two step first step structur class queri predict second step fold predict within predict structur class decreas complex classif problem also improv overal fold predict accuraci best knowledg first taxonom fold recognit method cover proteinfold give predict accuraci around benchmark dataset sinc new method give rise state art predict perform henc can use structur character protein discov various genom

Resumos Similares

J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,890608120476409 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,862947624266301 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,840806244816733 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,831905733506159 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,827640689016946 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,823944691880021 )
Comput Biol Chem - Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. ( 0,815158626414588 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,798915109045906 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,788473694881108 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,787668685304398 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,787666964030927 )
Comput. Biol. Med. - Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. ( 0,785450637367347 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,785388697831454 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,785196496519464 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,784758276348307 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,781112646568323 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,780525189453105 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,778563689485493 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,778519237124646 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,777658880394425 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,777490308479253 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers I: proteins. ( 0,776720128781419 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,773770571691447 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,771522971694644 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,76873342004957 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,768328333359188 )
Comput Methods Programs Biomed - Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. ( 0,768060173348855 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,767367721882215 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,765744152001035 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,764086646203538 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,761734075072721 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,761418727167292 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,760083974765546 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,75695614071229 )
Comput Math Methods Med - Na?ve Bayes classifier with feature selection to identify phage virion proteins. ( 0,756302359846184 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,755437603558239 )
Comput. Biol. Med. - Remote homology detection incorporating the context of physicochemical properties. ( 0,753419798273827 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,753261464105227 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,750122877798854 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,749727358777918 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,749257151002319 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,749116677766426 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,745241308194254 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,743331793632709 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,740346243358408 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,739406511636667 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,739386148313831 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,737120596777453 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,736095091601166 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,732441151981356 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,731128068738889 )
Comput. Biol. Med. - Improving protein complex classification accuracy using amino acid composition profile. ( 0,730815783440202 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,729973095184034 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,727965014774668 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,727929266804038 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,726945862572592 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,724252121352655 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,724026378814536 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,720681427837646 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,720653600430853 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,719982711888634 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,719594838248537 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,718726854242224 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,714055859098819 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,713376487508253 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,713056405386025 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,711553081040246 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,710924456488126 )
Curr Protoc Bioinformatics - Comparative Protein Structure Modeling Using MODELLER. ( 0,710061911084908 )
Brief. Bioinformatics - DRISEE overestimates errors in metagenomic sequencing data. ( 0,70739915530745 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,707037917769425 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,703467196450813 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,702683844102888 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,701867724186608 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,700040443973313 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,698924869614558 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,696853095623544 )
J Integr Bioinform - Predicting genes involved in human cancer using network contextual information. ( 0,696548702871737 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,695795389384534 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,695202889786499 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,694137342081188 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,693742415319105 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,693651471246612 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,69111496101681 )
Comput Biol Chem - Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. ( 0,689649612276671 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,685947116645249 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,684909924815974 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,681618641930482 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,681401615471825 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,679823694649864 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,679575494603149 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,678183534731133 )
J Chem Inf Model - Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress. ( 0,674661239473839 )
J. Comput. Biol. - Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. ( 0,674073309300221 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,671935398105757 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,670928616271251 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,669362359539696 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,668471029034058 )
Comput. Biol. Med. - Gene comparison based on the repetition of single-nucleotide structure patterns. ( 0,667513087660214 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,667384670823787 )