J. Comput. Biol. - Separating significant matches from spurious matches in DNA sequences.

Tópicos

{ method(1219) similar(1157) match(930) }
{ sequenc(1873) structur(1644) protein(1328) }
{ model(3404) distribut(989) bayesian(671) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ search(2224) databas(1162) retriev(909) }
{ perform(1367) use(1326) method(1137) }
{ can(774) often(719) complex(702) }
{ compound(1573) activ(1297) structur(1058) }
{ model(2656) set(1616) predict(1553) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ use(2086) technolog(871) perceiv(783) }
{ result(1111) use(1088) new(759) }
{ method(1969) cluster(1462) data(1082) }
{ data(1737) use(1416) pattern(1282) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ method(1557) propos(1049) approach(1037) }
{ method(984) reconstruct(947) comput(926) }
{ howev(809) still(633) remain(590) }
{ model(2341) predict(2261) use(1141) }
{ state(1844) use(1261) util(961) }
{ age(1611) year(1155) adult(843) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ intervent(3218) particip(2042) group(1664) }
{ health(1844) social(1437) communiti(874) }
{ survey(1388) particip(1329) question(1065) }
{ activ(1452) weight(1219) physic(1104) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Word matches are widely used to compare genomic sequences. Complete genome alignment methods often rely on the use of matches as anchors for building their alignments, and various alignment-free approaches that characterize similarities between large sequences are based on word matches. Among matches that are retrieved from the comparison of two genomic sequences, a part of them may correspond to spurious matches (SMs), which are matches obtained by chance rather than by homologous relationships. The number of SMs depends on the minimal match length (l) that has to be set in the algorithm used to retrieve them. Indeed, if l is too small, a lot of matches are recovered but most of them are SMs. Conversely, if l is too large, fewer matches are retrieved but many smaller significant matches are certainly ignored. To date, the choice of l mostly depends on empirical threshold values rather than robust statistical methods. To overcome this problem, we propose a statistical approach based on the use of a mixture model of geometric distributions to characterize the distribution of the length of matches obtained from the comparison of two genomic sequences.

Resumo Limpo

word match wide use compar genom sequenc complet genom align method often reli use match anchor build align various alignmentfre approach character similar larg sequenc base word match among match retriev comparison two genom sequenc part may correspond spurious match sms match obtain chanc rather homolog relationship number sms depend minim match length l set algorithm use retriev inde l small lot match recov sms convers l larg fewer match retriev mani smaller signific match certain ignor date choic l most depend empir threshold valu rather robust statist method overcom problem propos statist approach base use mixtur model geometr distribut character distribut length match obtain comparison two genom sequenc

Resumos Similares

Comput Biol Chem - Understanding the general packing rearrangements required for successful template based modeling of protein structure from a CASP experiment. ( 0,850865148425068 )
Comput. Biol. Med. - Prediction of protein functions based on function-function correlation relations. ( 0,738521238967982 )
J Chem Inf Model - Kink characterization and modeling in transmembrane protein structures. ( 0,716710787729008 )
Comput. Biol. Med. - Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches. ( 0,713917050666686 )
J Chem Inf Model - MetalS2: a tool for the structural alignment of minimal functional sites in metal-binding proteins and nucleic acids. ( 0,696130730179376 )
J. Comput. Biol. - The distribution of word matches between Markovian sequences with periodic boundary conditions. ( 0,682327670187387 )
IEEE Trans Neural Netw Learn Syst - A Unified Framework for Data Visualization and Coclustering. ( 0,674092360292031 )
J Chem Inf Model - Build-up algorithm for atomic correspondence between chemical structures. ( 0,665242388279638 )
Comput Methods Programs Biomed - Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. ( 0,657353805805554 )
J Integr Bioinform - Complementarity of network and sequence information in homologous proteins. ( 0,656565517253762 )
Comput Math Methods Med - Structural complexity of DNA sequence. ( 0,653173951394355 )
Comput Math Methods Med - Identification of antioxidants from sequence information using na?ve Bayes. ( 0,651146920138676 )
J. Comput. Biol. - A probabilistic model for sequence alignment with context-sensitive indels. ( 0,649950019152682 )
IEEE Trans Image Process - A uniform grid structure to speed up example-based photometric stereo. ( 0,632382587026162 )
J. Comput. Biol. - Smoothing 3D protein structure motifs through graph mining and amino acid similarities. ( 0,621428245685403 )
Comput Biol Chem - Predicting protein-protein interactions using graph invariants and a neural network. ( 0,621047849995863 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,617222039393162 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,615014997286925 )
J Chem Inf Model - Exploiting structural information in patent specifications for key compound prediction. ( 0,611300378070012 )
IEEE Trans Image Process - Flexible Image Similarity Computation Using Hyper-Spatial Matching. ( 0,607610173814385 )
Methods Inf Med - The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting. ( 0,599213720735662 )
Comput Biol Chem - A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. ( 0,598550235068811 )
IEEE Trans Image Process - Robust pairwise matching of interest points with complex wavelets. ( 0,587850422106005 )
J. Comput. Biol. - Optimization of profile-to-profile alignment parameters for one-dimensional threading. ( 0,587418384656943 )
Comput Biol Chem - Computational model for analyzing the evolutionary patterns of the neuraminidase gene of influenza A/H1N1. ( 0,584798323062667 )
Artif Intell Med - Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms. ( 0,583895942529892 )
Comput. Biol. Med. - Application of 2D graphic representation of protein sequence based on Huffman tree method. ( 0,577636866511417 )
J. Comput. Biol. - Optimization of combinatorial mutagenesis. ( 0,576884976754724 )
J. Comput. Biol. - The generating function approach for Peptide identification in spectral networks. ( 0,576286068794315 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,574260441717217 )
J Integr Bioinform - Predicting protein distance maps according to physicochemical properties. ( 0,569678901994302 )
IEEE Trans Pattern Anal Mach Intell - Conditional Alignment Random Fields for Multiple Motion Sequence Alignment. ( 0,568980147109201 )
Methods Inf Med - A simplification and implementation of random-effects meta-analyses based on the exact distribution of Cochran's Q. ( 0,563904005557979 )
Int J Med Inform - Content analysis of physical examination templates in electronic health records using SNOMED CT. ( 0,560064320327669 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,558438988022385 )
IEEE Trans Image Process - Correlation-coefficient-based fast template matching through partial elimination. ( 0,558070547004755 )
Comput Math Methods Med - Space constrained homology modelling: the paradigm of the RNA-dependent RNA polymerase of dengue (type II) virus. ( 0,557549249064862 )
Comput Math Methods Med - Analyzing effects of naturally occurring missense mutations. ( 0,554091638503641 )
IEEE Trans Pattern Anal Mach Intell - On Kleinberg's Stochastic Discrimination Procedure. ( 0,553133980780464 )
IEEE Trans Image Process - A marked point process for modeling lidar waveforms. ( 0,551843792974598 )
J Chem Inf Model - Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. ( 0,55124326223671 )
IEEE Trans Neural Netw Learn Syst - Incremental Generalized Discriminative Common Vectors for Image Classification. ( 0,550822632755244 )
IEEE Trans Image Process - Interval-valued fuzzy sets applied to stereo matching of color images. ( 0,550447139729888 )
J Biomed Inform - Clustering clinical models from local electronic health records based on semantic similarity. ( 0,54985770321829 )
Comput. Biol. Med. - Cutaneous amyloidoses: a minimum common denominator in their amino acid sequence. ( 0,549536491970257 )
Comput. Biol. Med. - A bilateral analysis scheme for false positive reduction in mammogram mass detection. ( 0,548176526718239 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,547344081859342 )
Brief. Bioinformatics - Ortholog identification in the presence of domain architecture rearrangement. ( 0,546447854363493 )
Brief. Bioinformatics - BamView: visualizing and interpretation of next-generation sequencing read alignments. ( 0,544740951478932 )
J Chem Inf Model - COSMOsim3D: 3D-similarity and alignment based on COSMO polarization charge densities. ( 0,541927033358613 )
J Chem Inf Model - Ligand-based target prediction with signature fingerprints. ( 0,540455719097292 )
IEEE Trans Image Process - A multiscale wavelet-based test for isotropy of random fields on a regular lattice. ( 0,540139289175183 )
Comput Methods Programs Biomed - Can computational biology improve the phylogenetic analysis of insulin? ( 0,540051583377314 )
Int J Comput Assist Radiol Surg - Multi-contrast unbiased MRI atlas of a Parkinson's disease population. ( 0,53932708480804 )
J. Comput. Biol. - Maximum parsimony, substitution model, and probability phylogenetic trees. ( 0,53663013170112 )
J Biomed Inform - Using genetic algorithm in reconstructing single individual haplotype with minimum error correction. ( 0,534863518119834 )
Comput Biol Chem - A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. ( 0,532000869116484 )
Comput Methods Programs Biomed - Automated detection of fovea in fundus images based on vessel-free zone and adaptive Gaussian template. ( 0,531749698459865 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,530138113234454 )
IEEE Trans Image Process - General subspace learning with corrupted training data via graph embedding. ( 0,528231281540004 )
J Chem Inf Model - PocketAlign a novel algorithm for aligning binding sites in protein structures. ( 0,527873831682551 )
J Chem Inf Model - Reading PDB: perception of molecules from 3D atomic coordinates. ( 0,525858888977769 )
Comput Biol Chem - Entropy and long-range correlations in DNA sequences. ( 0,524010103930805 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,522696948544265 )
Telemed J E Health - Measuring the effect of telecare on medical expenditures without bias using the propensity score matching method. ( 0,522057770578768 )
J. Comput. Biol. - Computational techniques for human genome resequencing using mated gapped reads. ( 0,521158790916142 )
Comput. Biol. Med. - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling. ( 0,51986477583323 )
IEEE Trans Image Process - Robust feature point matching with sparse model. ( 0,519483053871813 )
J Chem Inf Model - Extraction of protein binding pockets in close neighborhood of bound ligands makes comparisons simple due to inherent shape similarity. ( 0,518751587130773 )
J. Comput. Biol. - Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. ( 0,517264461692083 )
Comput. Biol. Med. - A similarity matrix-based hybrid algorithm for the contact map overlaps problem. ( 0,517065473075634 )
J Chem Inf Model - Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. ( 0,516915013049389 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,516346126976251 )
J Biomed Inform - A kinetic model-based algorithm to classify NGS short reads by their allele origin. ( 0,51622811953335 )
Curr Protoc Bioinformatics - Comparative Protein Structure Modeling Using MODELLER. ( 0,514504773077087 )
Comput Math Methods Med - Image segmentation and identification of paired antibodies in breast tissue. ( 0,512966057487571 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,512740091698666 )
Comput Biol Chem - Probabilistic model based error correction in a set of various mutant sequences analyzed by next-generation sequencing. ( 0,511225357978602 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,511169628454338 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,509379089381866 )
J. Comput. Biol. - Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. ( 0,508965814575672 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,507370641317695 )
Comput Biol Chem - Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes. ( 0,507203242075755 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,506869030401272 )
IEEE Trans Image Process - Establishing point correspondence of 3D faces via sparse facial deformable model. ( 0,506162866027278 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,505968028117716 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,505879478453293 )
J. Comput. Biol. - Accurate estimations of evolutionary times in the context of strong CpG hypermutability. ( 0,504992824419438 )
Comput Biol Chem - Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. ( 0,504087970128636 )
Comput Math Methods Med - Iterative methods for obtaining energy-minimizing parametric snakes with applications to medical imaging. ( 0,497346416768581 )
Brief. Bioinformatics - Identify drug repurposing candidates by mining the protein data bank. ( 0,495181663300502 )
Methods Inf Med - Optimal two-stage designs for single-arm phase II oncology trials with two binary endpoints. ( 0,494210900174743 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,492552076105435 )
Artif Intell Med - An ontology-based comparative anatomy information system. ( 0,491653512691026 )
J. Comput. Biol. - The 5'-3' distance of RNA secondary structures. ( 0,49087467894732 )
Neural Comput - Robust observer-based tracking control of hodgkin-huxley neuron systems under environmental disturbances. ( 0,490573276095443 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,490260120573985 )
J Integr Bioinform - Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features. ( 0,49024640187308 )
J Chem Inf Model - Searching for likeness in a database of macromolecular complexes. ( 0,490186012301456 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,488745734556024 )