Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ method(1219) similar(1157) match(930) }
{ bind(1733) structur(1185) ligand(1036) }
{ howev(809) still(633) remain(590) }
{ method(984) reconstruct(947) comput(926) }
{ data(3963) clinic(1234) research(1004) }
{ research(1218) medic(880) student(794) }
{ state(1844) use(1261) util(961) }
{ use(976) code(926) identifi(902) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ measur(2081) correl(1212) valu(896) }
{ framework(1458) process(801) describ(734) }
{ spatial(1525) area(1432) region(1030) }
{ group(2977) signific(1463) compar(1072) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }
{ inform(2794) health(2639) internet(1427) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ control(1307) perform(991) simul(935) }
{ general(901) number(790) one(736) }
{ sampl(1606) size(1419) use(1276) }
{ high(1669) rate(1365) level(1280) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }

Resumo

Several learning approaches have been used to predict RNA-binding amino acids in a protein sequence, but there has been little attempt to predict protein-binding nucleotides in an RNA sequence. One of the reasons is that the differences between nucleotides in their interaction propensity are much smaller than those between amino acids. Another reason is that RNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding RNA nucleotides is much harder than predicting RNA-binding amino acids. We developed a new method that removes data redundancy in a training set of sequences based on their features. The new method constructs a larger and more informative training set than the standard redundancy removal method based on sequence similarity, and the constructed dataset is guaranteed to be redundancy-free. We computed the interaction propensity (IP) of nucleotide triplets by applying a new definition of IP to an extensive dataset of protein-RNA complexes, and developed a support vector machine (SVM) model to predict protein binding sites in RNA sequences. In a 5-fold cross-validation with 812 RNA sequences, the SVM model predicted protein-binding nucleotides with an accuracy of 86.4%, an F-measure of 84.8%, and a Matthews correlation coefficient of 0.66. With an independent dataset of 56 RNA sequences that were not used in training, the resulting accuracy was 68.1% with an F-measure of 71.7% and a Matthews correlation coefficient of 0.35. To the best of our knowledge, this is the first attempt to predict protein-binding RNA nucleotides in a given RNA sequence from the sequence data alone. The SVM model and datasets are freely available for academics at http://bclab.inha.ac.kr/primer.

Resumo Limpo

sever learn approach use predict rnabind amino acid protein sequenc littl attempt predict proteinbind nucleotid rna sequenc one reason differ nucleotid interact propens much smaller amino acid anoth reason rna exhibit less divers sequenc pattern protein therefor predict proteinbind rna nucleotid much harder predict rnabind amino acid develop new method remov data redund train set sequenc base featur new method construct larger inform train set standard redund remov method base sequenc similar construct dataset guarante redundancyfre comput interact propens ip nucleotid triplet appli new definit ip extens dataset proteinrna complex develop support vector machin svm model predict protein bind site rna sequenc fold crossvalid rna sequenc svm model predict proteinbind nucleotid accuraci fmeasur matthew correl coeffici independ dataset rna sequenc use train result accuraci fmeasur matthew correl coeffici best knowledg first attempt predict proteinbind rna nucleotid given rna sequenc sequenc data alon svm model dataset freeli avail academ httpbclabinhaackrprim

Resumos Similares

J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,876476999910517 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,865017152836157 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,8629476242663 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,818853830420975 )
Comput Biol Chem - Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. ( 0,818321604725866 )
Comput. Biol. Med. - Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. ( 0,786726078178601 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,781427194631731 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,779767340916057 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,779008396742264 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,763043171551637 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers I: proteins. ( 0,761277571971054 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,760079124010902 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,758047039249712 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,757792138485612 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,756505377733851 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,754650146491351 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,751970950958224 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,751736982716418 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,749832853559391 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,738133135383196 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,73573416677347 )
Comput. Biol. Med. - Remote homology detection incorporating the context of physicochemical properties. ( 0,735483141715136 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,734998675784829 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,734626808266007 )
Comput Math Methods Med - Na?ve Bayes classifier with feature selection to identify phage virion proteins. ( 0,732133172617927 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,728177241111012 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,727213189268618 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,727132306192659 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,725360759044375 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,724753708029534 )
Comput Methods Programs Biomed - Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. ( 0,724236653616235 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,724062645159472 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,723717951000333 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,723543532863419 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,721807673847737 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,721760212398631 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,721329470034243 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,719137041298688 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,719041244024482 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,718984085919025 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,718547061407316 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,715827941422525 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,715198472273198 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,714852162185675 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,71193487274151 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,711318285160452 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,707588893988329 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,706283296788365 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,705202725799722 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,704010514236938 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,703953160850025 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,701312491122469 )
Comput Biol Chem - Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. ( 0,701005616963423 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,700483274503551 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,6947616866986 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,693066134751811 )
Comput Biol Chem - newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. ( 0,692957455127927 )
Comput. Biol. Med. - Improving protein complex classification accuracy using amino acid composition profile. ( 0,691257399429204 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,690112197563372 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,690045980720573 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,686322115434706 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,686177729444215 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,68505800576025 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,683668027338035 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,680834627042714 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,680096029052949 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,678419425142841 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,674647958579931 )
J Chem Inf Model - Functional prediction of binding pockets. ( 0,674333044488709 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,673002561105574 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,672192740506785 )
Comput. Biol. Med. - Gene comparison based on the repetition of single-nucleotide structure patterns. ( 0,670722920384785 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,670682223637949 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,669701536681325 )
Comput. Biol. Med. - Haemophilus influenzae Genome Database (HIGDB): a single point web resource for Haemophilus influenzae. ( 0,669490634192203 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,66877169663013 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,668396097298146 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,667779295429708 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,66503228148104 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,664459547417892 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,663599602255682 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,660738993817821 )
J Integr Bioinform - Predicting genes involved in human cancer using network contextual information. ( 0,660221212877254 )
J Chem Inf Model - PocketAlign a novel algorithm for aligning binding sites in protein structures. ( 0,659774649225025 )
Comput. Biol. Med. - Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines. ( 0,658980259093119 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,658257221634207 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,657305899187684 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,654124153946718 )
Brief. Bioinformatics - Alpha shape and Delaunay triangulation in studies of protein-related interactions. ( 0,654102469907294 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,654075337920062 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,653815205352381 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,650918740934417 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,650006955295816 )
J. Comput. Biol. - Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. ( 0,648564470570866 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,648062363208417 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,645071186854587 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,644570202366487 )
Comput. Biol. Med. - FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. ( 0,644558879124118 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,644041334730625 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,643815243665224 )