Brief. Bioinformatics - Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ implement(1333) system(1263) develop(1122) }
{ data(3008) multipl(1320) sourc(1022) }
{ method(2212) result(1239) propos(1039) }
{ gene(2352) biolog(1181) express(1162) }
{ model(3480) simul(1196) paramet(876) }
{ result(1111) use(1088) new(759) }
{ activ(1452) weight(1219) physic(1104) }
{ imag(2675) segment(2577) method(1081) }
{ treatment(1704) effect(941) patient(846) }
{ concept(1167) ontolog(924) domain(897) }
{ case(1353) use(1143) diagnosi(1136) }
{ model(2656) set(1616) predict(1553) }
{ problem(2511) optim(1539) algorithm(950) }
{ compound(1573) activ(1297) structur(1058) }
{ first(2504) two(1366) second(1323) }
{ use(976) code(926) identifi(902) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ framework(1458) process(801) describ(734) }
{ extract(1171) text(1153) clinic(932) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ research(1218) medic(880) student(794) }
{ sampl(1606) size(1419) use(1276) }
{ intervent(3218) particip(2042) group(1664) }
{ can(981) present(881) function(850) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

Resumo Limpo

recent develop deep sequenc technolog facilit de novo genom sequenc project now conduct even individu laboratori howev will yield genom sequenc well assembl will hinder thorough annot close relat refer genom avail one challeng issu identif proteincod sequenc split multipl unassembl genom segment can confound ortholog assign various laboratori experi requir identif individu gene studi use genom cartilagin fish callorhinchus milii test case perform gene predict use model specif train genom implement algorithm design esprit identifi possibl linkag multipl proteincod portion deriv singl genom locus split multipl unassembl genom segment develop valid framework base artifici fragment human genom improv earli recent mous genom assembl comparison experiment valid sequenc genbank phylogenet analys strategi provid insight practic solut effici annot partial sequenc lowcoverag genom knowledg studi first formul method link unassembl genom segment base proteom relat distant relat speci refer

Resumos Similares

Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,764194903767451 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,757295914993582 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,747306073833892 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,744392223235365 )
Brief. Bioinformatics - Computational challenges of sequence classification in microbiomic data. ( 0,744340467435492 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,743628839389051 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,74089215514231 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,737203555806253 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,735312830870849 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,734576010446045 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,733550616929721 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,733177135215457 )
Comput Methods Programs Biomed - Pinda: a web service for detection and analysis of intraspecies gene duplication events. ( 0,73315156322921 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,732773111752588 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,732604472787761 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,732145347548629 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,724763415431396 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,724018051147305 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,721611168899464 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,719986308047714 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,718422491734469 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,717486778677112 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,716482043353351 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,71188614710113 )
Brief. Bioinformatics - Functional assignment of metagenomic data: challenges and applications. ( 0,710995398668821 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,71089826563723 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,709734128274921 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,706495832623521 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,705261260406946 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,70284495075456 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,702473334246034 )
Brief. Bioinformatics - Ultrafast clustering algorithms for metagenomic sequence analysis. ( 0,702318284197092 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,702005067469901 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,701583958240849 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,699625714004339 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,698637587847509 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,697702685480479 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,697495507415562 )
Brief. Bioinformatics - Bioinformatics tools and challenges in structural analysis of lipidomics MS/MS data. ( 0,697334607829529 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,696782314335051 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,696562972775697 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,696047034916248 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,694215126592763 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,692852210450836 )
Comput Biol Chem - Gene expression regulation of the PF00480 or PF14340 domain proteins suggests their involvement in sulfur metabolism. ( 0,689878101212244 )
Med Biol Eng Comput - The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. ( 0,689766328490172 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,688360047016184 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,686154658171674 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,685297081741193 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,685103572522566 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,685062768859333 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,684382005571141 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,682849233572621 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,682825344906378 )
Comput Methods Programs Biomed - Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks. ( 0,681013582882032 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,680609625906197 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,679686593800886 )
Comput Biol Chem - Gene cloning, homology comparison and analysis of the main functional structure domains of beta estrogen receptor in Jining Gray goat. ( 0,678026795001499 )
Comput. Biol. Med. - The possible role of HSPs on Beh?et's disease: a bioinformatic approach. ( 0,67750131560062 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,676828054162265 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,676204560398076 )
J. Comput. Biol. - Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. ( 0,674719951200767 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,674290474635207 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,673820473578365 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,671332459602539 )
J Chem Inf Model - Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. ( 0,664943289392723 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,664749638906642 )
Comput Biol Chem - Large replication skew domains delimit GC-poor gene deserts in human. ( 0,66411073405787 )
Comput Methods Programs Biomed - Quantitative thermodynamic predication of interactions between nucleic acid and non-nucleic acid species using Microsoft excel. ( 0,662008107711113 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,661004851473573 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,660762637945002 )
Comput Biol Chem - Genome-wide analysis and evolutionary study of sucrose non-fermenting 1-related protein kinase 2 (SnRK2) gene family members in Arabidopsis and Oryza. ( 0,660422309524254 )
Brief. Bioinformatics - Application of second-generation sequencing to cancer genomics. ( 0,659322853887708 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,659078639748511 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,653949002758403 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,653752929927181 )
J. Comput. Biol. - Catching the genomic wave in oligonucleotide single-nucleotide polymorphism arrays by modeling sequence binding. ( 0,653492999224884 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,651719512286446 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,651014436335834 )
J. Comput. Biol. - Tracing the most parsimonious indel history. ( 0,648413629526989 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,647247972617787 )
Comput Biol Chem - In silico characterization and evolutionary analyses of CCAAT binding proteins in the lycophyte plant Selaginella moellendorffii genome: a growing comparative genomics resource. ( 0,646328771299325 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,645926965005106 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,645538573280136 )
Comput. Biol. Med. - Prediction of methylation CpGs and their methylation degrees in human DNA sequences. ( 0,644667789372218 )
Sci Data - A draft genome for the African crocodilian trypanosome Trypanosoma grayi. ( 0,643954449947917 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,643599798603154 )
Comput Math Methods Med - Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality. ( 0,642454614979668 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,642272936086136 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,638447611950573 )
J Chem Inf Model - Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress. ( 0,637384772972619 )
Artif Intell Med - Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle. ( 0,636891383255177 )
Brief. Bioinformatics - Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes. ( 0,636668808808428 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,635971434465993 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,634750377821378 )
Comput Biol Chem - A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. ( 0,634662432932481 )
Comput Biol Chem - Semantically predicting protein functions based on protein functional connectivity. ( 0,633581500356203 )
Brief. Bioinformatics - Applications of alignment-free methods in epigenomics. ( 0,633147832798277 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,633121716974231 )
Brief. Bioinformatics - Identify drug repurposing candidates by mining the protein data bank. ( 0,629435350942343 )