Brief. Bioinformatics - Ultrafast clustering algorithms for metagenomic sequence analysis.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ method(1969) cluster(1462) data(1082) }
{ structur(1116) can(940) graph(676) }
{ implement(1333) system(1263) develop(1122) }
{ imag(2830) propos(1344) filter(1198) }
{ data(3008) multipl(1320) sourc(1022) }
{ error(1145) method(1030) estim(1020) }
{ survey(1388) particip(1329) question(1065) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ age(1611) year(1155) adult(843) }
{ sampl(1606) size(1419) use(1276) }
{ estim(2440) model(1874) function(577) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ visual(1396) interact(850) tool(830) }
{ model(3480) simul(1196) paramet(876) }
{ state(1844) use(1261) util(961) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ activ(1452) weight(1219) physic(1104) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

Resumo Limpo

rapid advanc highthroughput sequenc technolog dramat prompt metagenom studi microbi communiti exist various environ fundament question metagenom includ ident composit dynam microbi popul function interact howev massiv quantiti comprehens complex sequenc data pose tremend challeng data analysi challeng includ limit everincreas comput demand bias sequenc sampl sequenc error sequenc artifact novel sequenc sequenc cluster method can direct answer mani fundament question group similar sequenc famili addit cluster analysi also address challeng metagenom thus larg redund data set can repres small nonredund set cluster can repres singl entri consensus artifact can rapid detect cluster error can identifi filter correct use consensus sequenc within cluster

Resumos Similares

J. Comput. Biol. - A theoretical model for whole genome alignment. ( 0,811279856997718 )
J. Comput. Biol. - Detection of structural variants involving repetitive regions in the reference genome. ( 0,808205209377113 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,7569050116126 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,75190804736942 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,748079695286066 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,745140861138252 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,73488763795939 )
J Biomed Inform - A kinetic model-based algorithm to classify NGS short reads by their allele origin. ( 0,734794301673288 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,733296417968231 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,729388099774123 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,728402512835442 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,726009844046339 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,723355250284452 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,722237593439013 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,721413009394189 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,720357576561829 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,718699709258477 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,718629592723819 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,718494889890188 )
Comput Biol Chem - Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm. ( 0,71802345612235 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,713620982603377 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,712980639744098 )
Comput Biol Chem - A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. ( 0,709501876682698 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,708189537821077 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,707845855401651 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,70688418956672 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,704837153225512 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,702930254185463 )
Brief. Bioinformatics - Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes). ( 0,702318284197092 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,702306497070048 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,701905869349423 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,701375993659233 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,698832161674839 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,696349203794776 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,693312208302882 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,69267010284685 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,691904889297192 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,690981724981101 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,690157506068742 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,690116750563993 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,687828882321968 )
J. Comput. Biol. - Parallel continuous flow: a parallel suffix tree construction tool for whole genomes. ( 0,68764207002527 )
J Am Med Inform Assoc - HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. ( 0,681406278316418 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,680373780615466 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,680229173539905 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,676796741552214 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,67618915924877 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,674950176970129 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,674424216881415 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,673482994567518 )
Comput Math Methods Med - ADLD: a novel graphical representation of protein sequences and its application. ( 0,672408964839738 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,671354308517501 )
J Chem Inf Model - Cavities tell more than sequences: exploring functional relationships of proteases via binding pockets. ( 0,671076971368343 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,67043012692507 )
J. Comput. Biol. - Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences. ( 0,670416954610135 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,668650377690521 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,668210380479116 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,667749340094554 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,667260193363108 )
Brief. Bioinformatics - Functional assignment of metagenomic data: challenges and applications. ( 0,667197275925555 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,666451780333294 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,664591497720953 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,663128932718819 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,662958767786742 )
Comput Biol Chem - An efficient similarity search based on indexing in large DNA databases. ( 0,662365831581819 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,661444765630038 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,661265340260212 )
J. Comput. Biol. - Computational techniques for human genome resequencing using mated gapped reads. ( 0,65986607240955 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,657733091888597 )
Comput Methods Programs Biomed - Can computational biology improve the phylogenetic analysis of insulin? ( 0,65335721542262 )
Comput. Biol. Med. - Structural alphabet motif discovery and a structural motif database. ( 0,653311648007666 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,653165131935624 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,652982984552815 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,65214103455877 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,651162934440681 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,649948127368171 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,64815448770581 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,647522580916287 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,64697158415137 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,646534070391126 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,646101258014313 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,645641271059405 )
Artif Intell Med - Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle. ( 0,645152890349794 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,644127162971862 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,643100990500199 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,6428326389901 )
Comput Math Methods Med - Analyzing effects of naturally occurring missense mutations. ( 0,642135446911982 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,639559082565145 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,638159337803289 )
Comput Methods Programs Biomed - Pinda: a web service for detection and analysis of intraspecies gene duplication events. ( 0,637516037253775 )
Comput Biol Chem - Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction. ( 0,636459062878089 )
Sci Data - A draft genome for the African crocodilian trypanosome Trypanosoma grayi. ( 0,635154537044005 )
Comput. Biol. Med. - A protein mapping method based on physicochemical properties and dimension reduction. ( 0,629274629908038 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,626918972942346 )
Brief. Bioinformatics - Computational challenges of sequence classification in microbiomic data. ( 0,626586133241728 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,626016903222722 )
Med Biol Eng Comput - Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines. ( 0,625046818341426 )
J. Comput. Biol. - Emergent protein folding modeled with evolved neural cellular automata using the 3D HP model. ( 0,622987854534561 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,622779744315017 )
Comput Biol Chem - Large replication skew domains delimit GC-poor gene deserts in human. ( 0,622683301792541 )