J. Comput. Biol. - A Bayesian sampler for optimization of protein domain hierarchies.

Tópicos

{ model(3404) distribut(989) bayesian(671) }
{ sequenc(1873) structur(1644) protein(1328) }
{ structur(1116) can(940) graph(676) }
{ problem(2511) optim(1539) algorithm(950) }
{ perform(999) metric(946) measur(919) }
{ data(1737) use(1416) pattern(1282) }
{ featur(3375) classif(2383) classifi(1994) }
{ can(981) present(881) function(850) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ extract(1171) text(1153) clinic(932) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ first(2504) two(1366) second(1323) }
{ analysi(2126) use(1163) compon(1037) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ measur(2081) correl(1212) valu(896) }
{ method(1219) similar(1157) match(930) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ treatment(1704) effect(941) patient(846) }
{ general(901) number(790) one(736) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ blood(1257) pressur(1144) flow(957) }
{ model(3480) simul(1196) paramet(876) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ data(3008) multipl(1320) sourc(1022) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ implement(1333) system(1263) develop(1122) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy.

Resumo Limpo

process identifi model function diverg subgroup specif protein domain class arrang subgroup hierarch thus far larg done via manual curat accomplish automat optim unsolv statist algorithm problem address via markov chain mont carlo sampl take input typic larg multiplesequ align sampler creat optim hierarchi ad delet leaf node move node subtre hierarchi insert delet intern node redefin sequenc conserv pattern associ node oper base probabl distribut model conserv diverg pattern defin subgroup view pattern sequenc determin protein function node subtre hierarchi correspond subgroup sequenc similar biolog properti sampler can appli either de novo exist hierarchi appli protein domain multipl start point way converg similar solut near ident loglikelihood ratio score suggest typic find optim peak posterior probabl distribut similar differ independ generat near optim hierarchi given domain help distinguish robust statist uncertain featur thus futur applic sampler provid confid measur various featur domain hierarchi

Resumos Similares

J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,706241305136482 )
IEEE Trans Pattern Anal Mach Intell - Conditional Alignment Random Fields for Multiple Motion Sequence Alignment. ( 0,68146266233796 )
J. Comput. Biol. - A theoretical model for whole genome alignment. ( 0,679363119823279 )
J. Comput. Biol. - Parallel continuous flow: a parallel suffix tree construction tool for whole genomes. ( 0,665993219137561 )
IEEE Trans Image Process - Shape-based normalized cuts using spectral relaxation for biomedical segmentation. ( 0,658827507781709 )
Brief. Bioinformatics - Fighting against uncertainty: an essential issue in bioinformatics. ( 0,654350466908901 )
J. Comput. Biol. - Computational techniques for human genome resequencing using mated gapped reads. ( 0,641521160558724 )
J. Comput. Biol. - An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure. ( 0,633562880304499 )
J. Comput. Biol. - The 5'-3' distance of RNA secondary structures. ( 0,633140749495787 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,624815169668323 )
Comput Biol Chem - A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. ( 0,621435582261155 )
J. Comput. Biol. - On the inference of dirichlet mixture priors for protein sequence comparison. ( 0,619117696773194 )
Comput Biol Chem - Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes. ( 0,619035693250364 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,617553088477429 )
J. Comput. Biol. - Detection of structural variants involving repetitive regions in the reference genome. ( 0,615648680115389 )
Lifetime Data Anal - Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis. ( 0,612456104562929 )
J. Comput. Biol. - The distribution of word matches between Markovian sequences with periodic boundary conditions. ( 0,612192482962487 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,611259174925809 )
Comput Biol Chem - Entropy and long-range correlations in DNA sequences. ( 0,610501231840137 )
Artif Intell Med - On the interplay of machine learning and background knowledge in image interpretation by Bayesian networks. ( 0,609058784318225 )
IEEE Trans Image Process - Hyperspectral image representation and processing with binary partition trees. ( 0,607758374126972 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,607173836190108 )
Brief. Bioinformatics - Structural mapping: how to study the genetic architecture of a phenotypic trait through its formation mechanism. ( 0,603513768304443 )
J. Comput. Biol. - Random matrix approach to the distribution of genomic distance. ( 0,603015393540726 )
Comput Biol Chem - Heuristic energy landscape paving for protein folding problem in the three-dimensional HP lattice model. ( 0,602649093610349 )
J. Comput. Biol. - Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly. ( 0,598321953375939 )
Brief. Bioinformatics - Iteratively reweighted LASSO for mapping multiple quantitative trait loci. ( 0,596991609955425 )
Med Decis Making - The choice of a noninformative prior on between-study variance strongly affects predictions of future treatment effect. ( 0,596289761032385 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers I: proteins. ( 0,591688158289771 )
IEEE Trans Pattern Anal Mach Intell - C^4: Exploring Multiple Solutions in Graphical Models by Cluster Sampling. ( 0,591483642467992 )
IEEE Trans Image Process - Bayesian robust principal component analysis. ( 0,587817819531959 )
J. Comput. Biol. - Phylogeny inference based on spectral graph clustering. ( 0,586249395182045 )
Comput Biol Chem - Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction. ( 0,5845580833111 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,583428274567526 )
IEEE Trans Vis Comput Graph - Flow Visualization with Quantified Spatial and Temporal Errors Using Edge Maps. ( 0,582942017396724 )
J. Comput. Biol. - Simultaneous folding of alternative RNA structures with mutual constraints: an application to next-generation sequencing-based RNA structure probing. ( 0,580637618538716 )
Comput Biol Chem - Exploring the limits of fold discrimination by structural alignment: a large scale benchmark using decoys of known fold. ( 0,578237201932996 )
J. Comput. Biol. - Expectation-maximization algorithm for determining natural selection of Y-linked genes through two-sex branching processes. ( 0,577796124171879 )
IEEE Trans Vis Comput Graph - Dynamic Network Visualization with Extended Massive Sequence Views. ( 0,577269998482479 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,576754069799072 )
J. Comput. Biol. - Emergent protein folding modeled with evolved neural cellular automata using the 3D HP model. ( 0,576149618968305 )
J. Comput. Biol. - Smoothing 3D protein structure motifs through graph mining and amino acid similarities. ( 0,57590367200929 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,574886980533824 )
IEEE Trans Image Process - Graph cuts for curvature based image denoising. ( 0,574431366935334 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,573789974610877 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,571978673515872 )
Comput Biol Chem - Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm. ( 0,571906502454745 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,571134162952607 )
IEEE Trans Pattern Anal Mach Intell - Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? ( 0,570947467291661 )
J. Comput. Biol. - A polynomial-time algorithm computing lower and upper bounds of the rooted subtree prune and regraft distance. ( 0,567479058154481 )
J. Comput. Biol. - Exploiting genome structure in association analysis. ( 0,567311852836074 )
J Chem Inf Model - Context-based features enhance protein secondary structure prediction accuracy. ( 0,566637486099828 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,566460483980856 )
Comput. Biol. Med. - Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. ( 0,56580896099759 )
Comput. Biol. Med. - A protein mapping method based on physicochemical properties and dimension reduction. ( 0,564373460822733 )
IEEE Trans Image Process - Bayesian inference of models and hyperparameters for robust optical-flow estimation. ( 0,563680789878423 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,563288862886168 )
Comput Math Methods Med - Structural complexity of DNA sequence. ( 0,563244788199817 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,563129367576615 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,56189324879473 )
Comput Biol Chem - ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. ( 0,56121002684871 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,560780001708222 )
IEEE Trans Pattern Anal Mach Intell - Trinary-Projection Trees for Approximate Nearest Neighbor Search. ( 0,559482430444437 )
Comput Biol Chem - Large replication skew domains delimit GC-poor gene deserts in human. ( 0,558331399363721 )
IEEE Trans Neural Netw Learn Syst - Robust Novelty Detection via Worst Case CVaR Minimization. ( 0,557629243417207 )
J. Comput. Biol. - Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. ( 0,557402154567664 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,557075509434716 )
Comput Biol Chem - Metabolic network motifs can provide novel insights into evolution: The evolutionary origin of Eukaryotic organelles as a case study. ( 0,555741444920041 )
Neural Comput - Exploitation of pairwise class distances for ordinal classification. ( 0,553889441099311 )
Comput Biol Chem - A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. ( 0,553204375636778 )
J Chem Inf Model - Structural effects of pH and deacylation on surfactant protein C in an organic solvent mixture: a constant-pH MD study. ( 0,550695317999735 )
IEEE Trans Neural Netw Learn Syst - Kernel reconstruction ICA for sparse representation. ( 0,550383682364587 )
IEEE Trans Image Process - Minimization of monotonically levelable higher order MRF energies via graph cuts. ( 0,550289071274729 )
J. Comput. Biol. - Alignment-free sequence comparison (II): theoretical power of comparison statistics. ( 0,549713388465948 )
Neural Comput - A network of spiking neurons for computing sparse representations in an energy-efficient way. ( 0,549279626712988 )
IEEE Trans Neural Netw Learn Syst - On recursive edit distance kernels with application to time series classification. ( 0,547549739456916 )
IEEE Trans Image Process - Bayesian nonparametric dictionary learning for compressed sensing MRI. ( 0,547094437690643 )
J. Comput. Biol. - Shapes of RNA pseudoknot structures. ( 0,546717521050869 )
Artif Intell Med - Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle. ( 0,546134297282547 )
J. Comput. Biol. - A probabilistic model for sequence alignment with context-sensitive indels. ( 0,54580671528822 )
IEEE Trans Pattern Anal Mach Intell - The Sum-over-Forests Density Index: Identifying Dense Regions in a Graph. ( 0,545794659269337 )
Brief. Bioinformatics - Computational methods for Gene Orthology inference. ( 0,544793769201437 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,544374283018713 )
IEEE Trans Pattern Anal Mach Intell - A Robust O(n) Solution to the Perspective-n-Point Problem. ( 0,54382476863274 )
Med Decis Making - Calibration of complex models through Bayesian evidence synthesis: a demonstration and tutorial. ( 0,543203129321288 )
Neural Comput - A semiparametric Bayesian model for detecting synchrony among multiple neurons. ( 0,54318407096058 )
Lifetime Data Anal - Bayesian local influence for survival models. ( 0,542996775984177 )
Comput. Biol. Med. - Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches. ( 0,541593414778788 )
J. Comput. Biol. - The approximability of shortest path-based graph orientations of protein-protein interaction networks. ( 0,538397888535575 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,53745276501624 )
J. Comput. Biol. - Statistical significance of normalized global alignment. ( 0,537327167669827 )
Artif Intell Med - Scalable approximate policies for Markov decision process models of hospital elective admissions. ( 0,536755810896438 )
J Chem Inf Model - Idealized models of protofilaments of human islet amyloid polypeptide. ( 0,535916622977165 )
Comput Biol Chem - Direct correlation analysis improves fold recognition. ( 0,535783170629852 )
Comput Methods Programs Biomed - Pinda: a web service for detection and analysis of intraspecies gene duplication events. ( 0,535636523683002 )
J. Comput. Biol. - Maximum parsimony, substitution model, and probability phylogenetic trees. ( 0,534645655149279 )
Neural Comput - Simple neural-like p systems for maximal independent set selection. ( 0,53460022074204 )
Brief. Bioinformatics - Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. ( 0,533900748054794 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,533452020406871 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,532163350840999 )