Comput Biol Chem - An efficient similarity search based on indexing in large DNA databases.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ search(2224) databas(1162) retriev(909) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1969) cluster(1462) data(1082) }
{ imag(1057) registr(996) error(939) }
{ first(2504) two(1366) second(1323) }
{ structur(1116) can(940) graph(676) }
{ perform(999) metric(946) measur(919) }
{ motion(1329) object(1292) video(1091) }
{ perform(1367) use(1326) method(1137) }
{ can(981) present(881) function(850) }
{ use(976) code(926) identifi(902) }
{ measur(2081) correl(1212) valu(896) }
{ featur(3375) classif(2383) classifi(1994) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ import(1318) role(1303) understand(862) }
{ spatial(1525) area(1432) region(1030) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ data(3008) multipl(1320) sourc(1022) }
{ analysi(2126) use(1163) compon(1037) }
{ high(1669) rate(1365) level(1280) }
{ process(1125) use(805) approach(778) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.

Resumo Limpo

indexbas search algorithm import part genom search construct indic key indexbas search algorithm comput similar two dna sequenc paper propos effici queri process method use special transform construct index use small storag rapid find similar two sequenc dna sequenc databas first sequenc partit equal length window select like subsequ comput ham distanc queri sequenc algorithm transform subsequ window multidimension vector space index frequenc charact includ posit inform charact subsequ result experi show algorithm faster run time heurist algorithm base index structur also algorithm accur heurist algorithm

Resumos Similares

J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,713483829902202 )
J Integr Bioinform - Parallel Niche Pareto AlineaGA--an evolutionary multiobjective approach on multiple sequence alignment. ( 0,697219518385839 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,671251458523272 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,666522426543776 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,663804292518494 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,663031016331335 )
Brief. Bioinformatics - Ultrafast clustering algorithms for metagenomic sequence analysis. ( 0,662365831581819 )
J Integr Bioinform - On comparison of SimTandem with state-of-the-art peptide identification tools, efficiency of precursor mass filter and dealing with variable modifications. ( 0,657896730594058 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,649811735047383 )
J. Comput. Biol. - Parallel continuous flow: a parallel suffix tree construction tool for whole genomes. ( 0,648674973140802 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,638728169713142 )
J Integr Bioinform - High performance pattern matching on heterogeneous platform. ( 0,636920176270738 )
J. Comput. Biol. - Detection of structural variants involving repetitive regions in the reference genome. ( 0,631649882769354 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,630728214334148 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,629052356340098 )
Comput Biol Chem - Heuristic-based tabu search algorithm for folding two-dimensional AB off-lattice model proteins. ( 0,628522321938952 )
Comput. Biol. Med. - GPU-based acceleration of an RNA tertiary structure prediction algorithm. ( 0,625517537203484 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,609790908363741 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,603729743841747 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,600973361707057 )
Comput. Biol. Med. - A data parallel strategy for aligning multiple biological sequences on multi-core computers. ( 0,60020294669683 )
J Biomed Inform - A kinetic model-based algorithm to classify NGS short reads by their allele origin. ( 0,594400352817536 )
Comput. Biol. Med. - An ant colony optimization based algorithm for identifying gene regulatory elements. ( 0,594179011037443 )
Brief. Bioinformatics - Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data. ( 0,591434302048117 )
J Chem Inf Model - Searching for likeness in a database of macromolecular complexes. ( 0,588428961957705 )
J. Comput. Biol. - A theoretical model for whole genome alignment. ( 0,586503905379537 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,585896953725325 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,576982027460246 )
Brief. Bioinformatics - Pattern recognition and probabilistic measures in alignment-free sequence analysis. ( 0,575692021819576 )
Brief. Bioinformatics - Base-calling for next-generation sequencing platforms. ( 0,57452425971407 )
Artif Intell Med - Memetic algorithms for de novo motif-finding in biomedical sequences. ( 0,572254836796314 )
Comput Biol Chem - Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures. ( 0,571224710154879 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,570609096698775 )
Comput Biol Chem - Investigating long range correlation in DNA sequences using significance tests of conditional mutual information. ( 0,56929347826087 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,566425626947741 )
Med Biol Eng Comput - A unified procedure for detecting, quantifying, and validating electrocardiogram T-wave alternans. ( 0,565695023023324 )
J. Comput. Biol. - Smoothing 3D protein structure motifs through graph mining and amino acid similarities. ( 0,565574122554466 )
Comput Biol Chem - A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. ( 0,564487908873792 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,564383638591193 )
J Chem Inf Model - Proteins as sponges: a statistical journey along protein structure organization principles. ( 0,560202838333777 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,559176597655845 )
J Chem Inf Model - Cavities tell more than sequences: exploring functional relationships of proteases via binding pockets. ( 0,558862206335326 )
J. Comput. Biol. - Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences. ( 0,557925104245189 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,556690759372983 )
Comput Math Methods Med - ADLD: a novel graphical representation of protein sequences and its application. ( 0,556301040625421 )
J Am Med Inform Assoc - HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. ( 0,555978379224698 )
J. Comput. Biol. - Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. ( 0,553751410833417 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,553343196221519 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,553167346690252 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,552862241491092 )
Comput Biol Chem - Identification of potential drug targets by subtractive genome analysis of Bacillus anthracis A0248: An in silico approach. ( 0,552793231817157 )
Comput. Biol. Med. - A fast hierarchical clustering algorithm for large-scale protein sequence data sets. ( 0,552532536762763 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,552394907861893 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,552242118351814 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,550104551300124 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,549366326893146 )
J Integr Bioinform - GMB: an efficient query processor for biological data. ( 0,548998247041921 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,548969631787862 )
Comput Biol Chem - Parallel molecular computation of modular-multiplication with two same inputs over finite field GF(2(n)) using self-assembly of DNA tiles. ( 0,548299331103504 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,547714013663085 )
Brief. Bioinformatics - DRISEE overestimates errors in metagenomic sequencing data. ( 0,547490877260764 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,546599747394802 )
Brief. Bioinformatics - Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. ( 0,545050999545238 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,543340319236111 )
J. Med. Internet Res. - Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical searches. ( 0,542821445471759 )
J Chem Inf Model - Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress. ( 0,53947099126295 )
J Chem Inf Model - Subpocket analysis method for fragment-based drug discovery. ( 0,538867336988291 )
IEEE Trans Vis Comput Graph - Moving Least-Squares Reconstruction of Large Models with GPUs. ( 0,538725901465775 )
Comput. Biol. Med. - Structural alphabet motif discovery and a structural motif database. ( 0,537863249348754 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,536091411287554 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,534347738055743 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,534224175755123 )
Comput Math Methods Med - Identification of DNA-binding proteins using support vector machine with sequence information. ( 0,534051505488018 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,533481122026018 )
Comput Biol Chem - Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm. ( 0,533265440933618 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,533083849506107 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,531489970155408 )
Brief. Bioinformatics - A practical guide for the computational selection of residues to be experimentally characterized in protein families. ( 0,530663840964477 )
Comput. Biol. Med. - A protein mapping method based on physicochemical properties and dimension reduction. ( 0,530155766088489 )
J Chem Inf Model - Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. ( 0,528848300133007 )
J. Comput. Biol. - Computing the probability of RNA hairpin and multiloop formation. ( 0,528112339038882 )
J. Comput. Biol. - A geometric arrangement algorithm for structure determination of symmetric protein homo-oligomers from NOEs and RDCs. ( 0,527333686821609 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,526352102043952 )
Comput Biol Chem - Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction. ( 0,526031148576895 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,524977439470834 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,523889669116106 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,523504557879108 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,52339736834914 )
J Biomed Inform - Reflective random indexing for semi-automatic indexing of the biomedical literature. ( 0,522547247885694 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,521664234778994 )
J Med Syst - An efficient automated algorithm to detect ocular surface temperature on sequence of thermograms using snake and target tracing function. ( 0,520839201844872 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,519886363636364 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,519742097352418 )
J Chem Inf Model - String kernels and high-quality data set for improved prediction of kinked helices in a-helical membrane proteins. ( 0,519584498535331 )
J Am Med Inform Assoc - Efficient sequential and parallel algorithms for record linkage. ( 0,518894840599268 )
Comput Biol Chem - Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. ( 0,51878713456921 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,518512301416688 )
Brief. Bioinformatics - Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees. ( 0,518188359707514 )
Brief. Bioinformatics - Reference databases for taxonomic assignment in metagenomics. ( 0,518136959780818 )
Brief. Bioinformatics - Classification of metagenomic sequences: methods and challenges. ( 0,517944783864987 )