J. Comput. Biol. - Nonparametric combinatorial sequence models.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ model(3404) distribut(989) bayesian(671) }
{ method(1219) similar(1157) match(930) }
{ method(1557) propos(1049) approach(1037) }
{ howev(809) still(633) remain(590) }
{ model(2341) predict(2261) use(1141) }
{ state(1844) use(1261) util(961) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ imag(1947) propos(1133) code(1026) }
{ bind(1733) structur(1185) ligand(1036) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ data(2317) use(1299) case(1017) }
{ sampl(1606) size(1419) use(1276) }
{ use(976) code(926) identifi(902) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ network(2748) neural(1063) input(814) }
{ studi(2440) review(1878) systemat(933) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ clinic(1479) use(1117) guidelin(835) }
{ search(2224) databas(1162) retriev(909) }
{ data(3963) clinic(1234) research(1004) }
{ system(1050) medic(1026) inform(1018) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ model(3480) simul(1196) paramet(876) }
{ cost(1906) reduc(1198) effect(832) }
{ analysi(2126) use(1163) compon(1037) }
{ high(1669) rate(1365) level(1280) }
{ use(1733) differ(960) four(931) }
{ estim(2440) model(1874) function(577) }
{ can(774) often(719) complex(702) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ perform(1367) use(1326) method(1137) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

Resumo Limpo

work consid biolog sequenc exhibit combinatori structur composit group posit align sequenc link covari one unit across sequenc multipl group exist complex interact can emerg sequenc kind aris frequent biolog methodolog analyz still develop articl present nonparametr prior sequenc allow combinatori structur emerg induc posterior distribut factor sequenc represent carri experi three biolog sequenc famili indic combinatori structur inde present combinatori sequenc model can succinct describ simpler mixtur model conclud applic mhc bind predict highlight util posterior distribut sequenc represent induc prior integr posterior method compar favor lead bind predictor

Resumos Similares

Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,847051381917431 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,846221393923081 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,837111651729914 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,830278087225996 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,829824981094084 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,825522699279888 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,825046230668688 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,821851882220112 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,821000945913882 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,816892655119197 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,815084784625686 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,813226315744243 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,813088805925286 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,81004151239816 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,80893076574406 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,80617502252268 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,805897076179556 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,805761545838846 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,805424308457886 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,803342947534584 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,801214883008172 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,798727703394393 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,798477673704995 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,796438702897015 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,793658655933609 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,792104382066181 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,790827398083287 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,79037778678806 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,790060720832691 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,788516537548805 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,787546202721522 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,784680543376195 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,783962057119227 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,783259497695038 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,783005559664755 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,782068542823299 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,779613515949898 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,778555099367879 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,777886279847449 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,777343398066825 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,773565934694095 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,769265363849022 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,766414170059424 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,766198849972366 )
Comput. Biol. Med. - Prediction of protein functions based on function-function correlation relations. ( 0,764864818014099 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,764041728569116 )
Comput Biol Chem - Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes. ( 0,761739419548774 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,761284884851275 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,760566755790505 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,759606086995934 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,758130107431383 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,757427198171793 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,757328197287029 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,754516897887764 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,751283825550244 )
J. Comput. Biol. - A probabilistic model for sequence alignment with context-sensitive indels. ( 0,750605843619517 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,749125438988762 )
Comput Biol Chem - In silico characterization and evolutionary analyses of CCAAT binding proteins in the lycophyte plant Selaginella moellendorffii genome: a growing comparative genomics resource. ( 0,749101882448082 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,74706761520552 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,74704273324483 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,746981330873026 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,746659118541143 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,743419329655057 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,741725149385313 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,740986606906186 )
Comput Biol Chem - Genome-wide analysis and evolutionary study of sucrose non-fermenting 1-related protein kinase 2 (SnRK2) gene family members in Arabidopsis and Oryza. ( 0,740231002300466 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,737481482355264 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,737217646254658 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,736796037119642 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,735747986875208 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,734181086389654 )
Sci Data - A draft genome for the African crocodilian trypanosome Trypanosoma grayi. ( 0,733153461261937 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,732197065440093 )
J Chem Inf Model - Structural effects of pH and deacylation on surfactant protein C in an organic solvent mixture: a constant-pH MD study. ( 0,729808031219925 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,729313948727783 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,729212745017466 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,728639132172355 )
Comput Math Methods Med - ADLD: a novel graphical representation of protein sequences and its application. ( 0,728135457916219 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,727938826975546 )
Comput Biol Chem - Large replication skew domains delimit GC-poor gene deserts in human. ( 0,724569208448051 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,721329470034243 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,718726854242223 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,718112537478861 )
Brief. Bioinformatics - Identify drug repurposing candidates by mining the protein data bank. ( 0,717282175687565 )
Comput Biol Chem - Gene cloning, homology comparison and analysis of the main functional structure domains of beta estrogen receptor in Jining Gray goat. ( 0,716810165734312 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,715677544297623 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,713486381521055 )
Comput Biol Chem - Semantically predicting protein functions based on protein functional connectivity. ( 0,712248359139948 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,710405163178254 )
Comput Biol Chem - Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures. ( 0,708996334532199 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,70874292088617 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,703692978393297 )
Comput Biol Chem - A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. ( 0,7032565855964 )
IEEE Trans Image Process - Pattern masking estimation in image with structural uncertainty. ( 0,703200989992882 )
Comput Biol Chem - Systematic analysis of an amidase domain CHAP in 12 Staphylococcus aureus genomes and 44 staphylococcal phage genomes. ( 0,702369953080613 )
Comput. Biol. Med. - Structural alphabet motif discovery and a structural motif database. ( 0,700854883973179 )
Comput Methods Programs Biomed - Pinda: a web service for detection and analysis of intraspecies gene duplication events. ( 0,700168610246633 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,699447054757928 )
Comput Biol Chem - Comparison of linear gap penalties and profile-based variable gap penalties in profile-profile alignments. ( 0,699426287591361 )
Brief. Bioinformatics - Genome variation discovery with high-throughput sequencing data. ( 0,699114120774954 )