Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ model(2341) predict(2261) use(1141) }
{ model(3404) distribut(989) bayesian(671) }
{ estim(2440) model(1874) function(577) }
{ method(1219) similar(1157) match(930) }
{ assess(1506) score(1403) qualiti(1306) }
{ featur(1941) imag(1645) propos(1176) }
{ perform(999) metric(946) measur(919) }
{ implement(1333) system(1263) develop(1122) }
{ detect(2391) sensit(1101) algorithm(908) }
{ imag(2830) propos(1344) filter(1198) }
{ general(901) number(790) one(736) }
{ howev(809) still(633) remain(590) }
{ import(1318) role(1303) understand(862) }
{ compound(1573) activ(1297) structur(1058) }
{ measur(2081) correl(1212) valu(896) }
{ featur(3375) classif(2383) classifi(1994) }
{ take(945) account(800) differ(722) }
{ problem(2511) optim(1539) algorithm(950) }
{ method(984) reconstruct(947) comput(926) }
{ first(2504) two(1366) second(1323) }
{ high(1669) rate(1365) level(1280) }
{ studi(2440) review(1878) systemat(933) }
{ concept(1167) ontolog(924) domain(897) }
{ health(3367) inform(1360) care(1135) }
{ state(1844) use(1261) util(961) }
{ data(3008) multipl(1320) sourc(1022) }
{ patient(1821) servic(1111) care(1106) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ method(1969) cluster(1462) data(1082) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }

Resumo

Mutual information (MI) is an approach commonly used to estimate the evolutionary correlation of 2 amino acid sites. Although several MI methods exist, prior to our contribution no systematic method had been developed to assess their performance, or to establish numerical thresholds to detect co-evolving amino acid sites. The current study performed a Markov chain Monte Carlo (MCMC) algorithm on influenza viral sequences to capture their evolutionary characteristics. A consensus maximum clade credibility (MCC) tree was estimated from the samples, together with their amino acid substitution statistics, from which we generated synthetic sequences of known dependent and independent paired amino acid sites. A pair-to-pair and influenza-specific amino acid substitution matrix (P2PFLU) incorporated into Bayesian Evolutionary Analysis Sampling Trees (BEAST) enumerated these synthetic sequences. The sequences inherited evolutionary features and co-varying characteristics from the real viral sequences, rendering these synthetic data ideal for exploring their co-evolving features. For the MI measure, we proposed a novel metric called the empirical MI (MI(Em)), which outperformed other MI measures in analysis of receiver operating characteristics (ROC). We implemented our approach on 1086 all-time PB2 sequences of influenza A H5N1 viruses, in which we found 97 sites exhibiting co-evolutionary substitution of one or more amino acid sites. In particular, PB2 451, along with eight other PB2 sites of various MI(Em) scores, was found to co-evolve with PB2 627, a known species-associated amino acid residue which plays a critical role in influenza virus replication.

Resumo Limpo

mutual inform mi approach common use estim evolutionari correl amino acid site although sever mi method exist prior contribut systemat method develop assess perform establish numer threshold detect coevolv amino acid site current studi perform markov chain mont carlo mcmc algorithm influenza viral sequenc captur evolutionari characterist consensus maximum clade credibl mcc tree estim sampl togeth amino acid substitut statist generat synthet sequenc known depend independ pair amino acid site pairtopair influenzaspecif amino acid substitut matrix ppflu incorpor bayesian evolutionari analysi sampl tree beast enumer synthet sequenc sequenc inherit evolutionari featur covari characterist real viral sequenc render synthet data ideal explor coevolv featur mi measur propos novel metric call empir mi miem outperform mi measur analysi receiv oper characterist roc implement approach alltim pb sequenc influenza hn virus found site exhibit coevolutionari substitut one amino acid site particular pb along eight pb site various miem score found coevolv pb known speciesassoci amino acid residu play critic role influenza virus replic

Resumos Similares

J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,805761545838846 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,803261992441242 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,803162882396125 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,796703462452437 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,794057518292139 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,784427993027675 )
Comput. Biol. Med. - Prediction of protein functions based on function-function correlation relations. ( 0,782704650867763 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,779897833506384 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,778117257172992 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,767075974303251 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,766239178144449 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,766104593368734 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,765523619696464 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,764768353584528 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,762940829652802 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,759567630326903 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,759498048254054 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,759259085539553 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,758855330432735 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,758472277051533 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,756775493597887 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,756314517808712 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,755271719681082 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,749248575771142 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,747292492827586 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,747124191666063 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,744569317228464 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,744281763981596 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,74403876771655 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,743581370065952 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,740205611248628 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,739520434714999 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,737822310129763 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,735931759777952 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,73519345344805 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,733850488481346 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,733609315932718 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,733025407623754 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,729891360714847 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,729092955113832 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,727793164886465 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,726837916029029 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,724445808912949 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,724024005720368 )
Comput Biol Chem - A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. ( 0,72281632838562 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,72064260958103 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,719982711888634 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,719444407986382 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,719041244024482 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,718626990669617 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,717940155449522 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,716580817117514 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,7159072635765 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,71582287065747 )
J. Comput. Biol. - Computational techniques for human genome resequencing using mated gapped reads. ( 0,714642011803882 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,713910519330458 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,712330131109358 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,712054141636237 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,710931790511255 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,710670490792623 )
Comput Biol Chem - Investigating long range correlation in DNA sequences using significance tests of conditional mutual information. ( 0,709246990023509 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,708937659768757 )
J. Comput. Biol. - Statistical significance of normalized global alignment. ( 0,700729337522156 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,6995251141431 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,698932904010532 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,69870477345626 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,698049625135343 )
Comput. Biol. Med. - Remote homology detection incorporating the context of physicochemical properties. ( 0,697021472006047 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,692726617683333 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,692225543847778 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,691357731330664 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,690981914424488 )
Comput Biol Chem - PPM-Dom: a novel method for domain position prediction. ( 0,690886357355964 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,687587256250072 )
Comput. Biol. Med. - LRRsearch: An asynchronous server-based application for the prediction of leucine-rich repeat motifs and an integrative database of NOD-like receptors. ( 0,686717216216418 )
J. Comput. Biol. - A probabilistic model for sequence alignment with context-sensitive indels. ( 0,685840760891712 )
Comput. Biol. Med. - Improving protein complex classification accuracy using amino acid composition profile. ( 0,685210917947446 )
Brief. Bioinformatics - Alpha shape and Delaunay triangulation in studies of protein-related interactions. ( 0,684981802246085 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,684811736468365 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,684680961348891 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,683297297160585 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,682464713501373 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,681558551280467 )
Curr Protoc Bioinformatics - An Overview of RNA Sequence Analyses: Structure Prediction, ncRNA Gene Identification, and RNAi Design. ( 0,681377013043583 )
Comput Biol Chem - Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes. ( 0,68054465367162 )
Comput Biol Chem - Understanding the general packing rearrangements required for successful template based modeling of protein structure from a CASP experiment. ( 0,679820621826484 )
Brief. Bioinformatics - Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes). ( 0,679686593800886 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,677438417592464 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,676974710444488 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,676751741327629 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,676121359579223 )
Brief. Bioinformatics - Identify drug repurposing candidates by mining the protein data bank. ( 0,675601594702782 )
Comput. Biol. Med. - Structural alphabet motif discovery and a structural motif database. ( 0,674801558131828 )
J. Comput. Biol. - Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. ( 0,672741467530167 )
J Biomed Inform - A kinetic model-based algorithm to classify NGS short reads by their allele origin. ( 0,672621770156288 )
J. Comput. Biol. - The distribution of word matches between Markovian sequences with periodic boundary conditions. ( 0,67073561762535 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,669955951588456 )
J. Comput. Biol. - Accurate estimations of evolutionary times in the context of strong CpG hypermutability. ( 0,668868911607523 )
J Chem Inf Model - PocketAlign a novel algorithm for aligning binding sites in protein structures. ( 0,668808465295898 )
J. Comput. Biol. - Alignment-free sequence comparison (II): theoretical power of comparison statistics. ( 0,668575914353422 )