J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data.

Tópicos

{ method(1969) cluster(1462) data(1082) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(2212) result(1239) propos(1039) }
{ spatial(1525) area(1432) region(1030) }
{ signal(2180) analysi(812) frequenc(800) }
{ chang(1828) time(1643) increas(1301) }
{ algorithm(1844) comput(1787) effici(935) }
{ model(2656) set(1616) predict(1553) }
{ error(1145) method(1030) estim(1020) }
{ clinic(1479) use(1117) guidelin(835) }
{ model(2220) cell(1177) simul(1124) }
{ high(1669) rate(1365) level(1280) }
{ result(1111) use(1088) new(759) }
{ model(3404) distribut(989) bayesian(671) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ design(1359) user(1324) use(1319) }
{ method(984) reconstruct(947) comput(926) }
{ case(1353) use(1143) diagnosi(1136) }
{ research(1085) discuss(1038) issu(1018) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ survey(1388) particip(1329) question(1065) }
{ activ(1452) weight(1219) physic(1104) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Genomic sequencing techniques introduce experimental errors into reads which can mislead sequence assembly efforts and complicate the diagnostic process. Here we present a method for detecting and removing sequencing errors from reads generated in genomic shotgun sequencing projects prior to sequence assembly. For each input read, the set of all length k substrings (k-mers) it contains are calculated. The read is evaluated based on the frequency with which each k-mer occurs in the complete data set (k-count). For each read, k-mers are clustered using the variable-bandwidth mean-shift algorithm. Based on the k-count of the cluster center, clusters are classified as error regions or non-error regions. For the 23 real and simulated data sets tested (454 and Solexa), our algorithm detected error regions that cover 99% of all errors. A heuristic algorithm is then applied to detect the location of errors in each putative error region. A read is corrected by removing the errors, thereby creating two or more smaller, error-free read fragments. After performing error removal, the error-rate for all data sets tested decreased (~35-fold reduction, on average). EDAR has comparable accuracy to methods that correct rather than remove errors and when the error rate is greater than 3% for simulated data sets, it performs better. The performance of the Velvet assembler is generally better with error-removed data. However, for short reads, splitting at the location of errors can be problematic. Following error detection with error correction, rather than removal, may improve the assembly results.

Resumo Limpo

genom sequenc techniqu introduc experiment error read can mislead sequenc assembl effort complic diagnost process present method detect remov sequenc error read generat genom shotgun sequenc project prior sequenc assembl input read set length k substr kmer contain calcul read evalu base frequenc kmer occur complet data set kcount read kmer cluster use variablebandwidth meanshift algorithm base kcount cluster center cluster classifi error region nonerror region real simul data set test solexa algorithm detect error region cover error heurist algorithm appli detect locat error putat error region read correct remov error therebi creat two smaller errorfre read fragment perform error remov errorr data set test decreas fold reduct averag edar compar accuraci method correct rather remov error error rate greater simul data set perform better perform velvet assembl general better errorremov data howev short read split locat error can problemat follow error detect error correct rather remov may improv assembl result

Resumos Similares

Int J Health Geogr - A binary-based approach for detecting irregularly shaped clusters. ( 0,751562899757973 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,743566645018641 )
Int J Health Geogr - Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: a case study on murine typhus in south Texas. ( 0,724505228850039 )
Int J Health Geogr - Detecting activity locations from raw GPS data: a novel kernel-based algorithm. ( 0,723080037227035 )
IEEE Trans Neural Netw Learn Syst - Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. ( 0,707137361087365 )
Spat Spatiotemporal Epidemiol - Optimal selection of the spatial scan parameters for cluster detection: a simulation study. ( 0,707065908077416 )
J Chem Inf Model - Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. ( 0,702175364716911 )
Comput. Aided Surg. - The Equidistant Method - a novel hip joint simulation algorithm for detection of femoroacetabular impingement. ( 0,696874683809358 )
IEEE Trans Image Process - On averaging multiview relations for 3D scan registration. ( 0,69448032535492 )
AMIA Annu Symp Proc - Using hierarchical mixture of experts model for fusion of outbreak detection methods. ( 0,687745592184615 )
IEEE Trans Image Process - Nonrigid registration of 2-D and 3-D dynamic cell nuclei images for improved classification of subcellular particle motion. ( 0,672045198633806 )
Comput Math Methods Med - Decimative spectral estimation with unconstrained model order. ( 0,669924939059559 )
Int J Health Geogr - Voronoi distance based prospective space-time scans for point data sets: a dengue fever cluster analysis in a southeast Brazilian town. ( 0,665156096267147 )
J Biomed Inform - A kinetic model-based algorithm to classify NGS short reads by their allele origin. ( 0,66488091659604 )
Brief. Bioinformatics - Ultrafast clustering algorithms for metagenomic sequence analysis. ( 0,664591497720953 )
Med Decis Making - Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. ( 0,663793882432398 )
Int J Comput Assist Radiol Surg - Deformable registration of preoperative MR, pre-resection ultrasound, and post-resection ultrasound images of neurosurgery. ( 0,6613850016505 )
J. Comput. Biol. - Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data. ( 0,656624554706595 )
Int J Health Geogr - Detection of clusters of a rare disease over a large territory: performance of cluster detection methods. ( 0,650132562026771 )
J. Comput. Biol. - Detection of structural variants involving repetitive regions in the reference genome. ( 0,648726585597262 )
Comput Methods Programs Biomed - Fuzzy and hard clustering analysis for thyroid disease. ( 0,645585103179674 )
IEEE Trans Vis Comput Graph - GPU-based Multilevel Clustering. ( 0,64340811179793 )
Comput Math Methods Med - A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation. ( 0,641204198468894 )
J Chem Inf Model - Investigation of the use of spectral clustering for the analysis of molecular data. ( 0,639260014526062 )
Comput Biol Chem - An efficient similarity search based on indexing in large DNA databases. ( 0,629052356340098 )
J Integr Bioinform - An evolutionary and visual framework for clustering of DNA microarray data. ( 0,628430156381128 )
Neural Comput - Spontaneous clustering via minimum -divergence. ( 0,627405417120158 )
Comput. Biol. Med. - Evaluation of automatic feature detection algorithms in EEG: application to interburst intervals. ( 0,626502006601645 )
IEEE Trans Pattern Anal Mach Intell - A Link-Based Approach to the Cluster Ensemble Problem. ( 0,62610986965949 )
J. Comput. Biol. - Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences. ( 0,624555229573493 )
Comput. Biol. Med. - Analysis of adductors angle measurement in Hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. ( 0,623366368355341 )
Med Decis Making - Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials. ( 0,614162926114682 )
IEEE Trans Pattern Anal Mach Intell - Semi-Supervised Kernel Mean Shift Clustering. ( 0,611079815839963 )
J Med Syst - Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets. ( 0,608789128630612 )
J. Med. Internet Res. - Security analysis and improvements to the PsychoPass method. ( 0,608429026164122 )
Int J Med Robot - Coordinated control and experimentation of the dental arch generator of the tooth-arrangement robot. ( 0,606386779129178 )
J Chem Inf Model - String kernels and high-quality data set for improved prediction of kinked helices in a-helical membrane proteins. ( 0,606341640000873 )
Comput Biol Chem - piClust: a density based piRNA clustering algorithm. ( 0,604964531047397 )
J Integr Bioinform - Parallel Niche Pareto AlineaGA--an evolutionary multiobjective approach on multiple sequence alignment. ( 0,601916480593884 )
J Chem Inf Model - Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. ( 0,600917633204911 )
J. Comput. Biol. - A geometric clustering algorithm with applications to structural data. ( 0,597950042190052 )
IEEE Trans Image Process - Fast transforms for acoustic imaging--part I: theory. ( 0,596531604794957 )
Spat Spatiotemporal Epidemiol - Performance of cancer cluster Q-statistics for case-control residential histories. ( 0,595673946347318 )
J Integr Bioinform - Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. ( 0,590143554255506 )
Artif Intell Med - Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection. ( 0,589647933746276 )
Neural Comput - A nonparametric clustering algorithm with a quantile-based likelihood estimator. ( 0,585827444824908 )
J Chem Inf Model - Consensus methods for combining multiple clusterings of chemical structures. ( 0,584121493942424 )
J Chem Inf Model - Cavities tell more than sequences: exploring functional relationships of proteases via binding pockets. ( 0,581150497996158 )
Methods Inf Med - Application of microarray analysis on computer cluster and cloud platforms. ( 0,579652766740406 )
J Biomed Inform - Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. ( 0,573997530987805 )
Int J Health Geogr - Using statistical methods and genotyping to detect tuberculosis outbreaks. ( 0,571035394836307 )
AMIA Annu Symp Proc - Survival prediction and treatment recommendation with Bayesian techniques in lung cancer. ( 0,569164842708665 )
Comput Math Methods Med - White blood cell segmentation by circle detection using electromagnetism-like optimization. ( 0,568267637631752 )
Med Biol Eng Comput - Detection of swallows with silent aspiration using swallowing and breath sound analysis. ( 0,566800384508134 )
Brief. Bioinformatics - A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. ( 0,56650947919118 )
IEEE J Biomed Health Inform - LMI-Based Approaches for the Calibration of Continuous Glucose Measurement Sensors. ( 0,566262530244964 )
BMC Med Inform Decis Mak - Efficient algorithms for fast integration on large data sets from multiple sources. ( 0,565474962856468 )
Comput Methods Programs Biomed - Adaptive marker-free registration using a multiple point strategy for real-time and robust endoscope electromagnetic navigation. ( 0,561414174912729 )
Comput. Biol. Med. - Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine. ( 0,559682756506465 )
J Chem Inf Model - Benchmark data sets for structure-based computational target prediction. ( 0,559055211848992 )
Int J Comput Assist Radiol Surg - Preclinical feasibility of a technology framework for MRI-guided iliac angioplasty. ( 0,558379071948216 )
J Biomed Inform - Clustering clinical models from local electronic health records based on semantic similarity. ( 0,557848492862207 )
Int J Comput Assist Radiol Surg - CT dataset anisotropy management for oral implantology planning software. ( 0,557477957086997 )
Artif Intell Med - Missing data imputation using statistical and machine learning methods in a real breast cancer problem. ( 0,556652261556517 )
J Am Med Inform Assoc - Privacy-preserving heterogeneous health data sharing. ( 0,555968151441947 )
Comput Methods Programs Biomed - OLYMPUS: an automated hybrid clustering method in time series gene expression. Case study: host response after Influenza A (H1N1) infection. ( 0,553598249687443 )
Int J Health Geogr - Spatial heterogeneity of type I error for local cluster detection tests. ( 0,551872761287022 )
IEEE Trans Image Process - Simultaneous multiresolution strategies for nonrigid image registration. ( 0,548201526027072 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,54808839153157 )
IEEE Trans Image Process - Local affine image matching and synthesis based on structural patterns. ( 0,547154620826274 )
Int J Neural Syst - A genetic graph-based approach for partitional clustering. ( 0,545767009378372 )
IEEE J Biomed Health Inform - Optimization of heartbeat detection in fiber-optic unobtrusive measurements by using maximum a posteriori probability estimation. ( 0,545492865313612 )
J Biomed Inform - Quantifying the determinants of outbreak detection performance through simulation and machine learning. ( 0,544506048004046 )
Comput. Biol. Med. - A straightforward approach to computer-aided polyp detection using a polyp-specific volumetric feature in CT colonography. ( 0,544328631773468 )
Int J Med Robot - Non-orthogonal tool/flange and robot/world calibration. ( 0,541124572052057 )
Int J Health Geogr - Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters. ( 0,538713735232622 )
Med Decis Making - Cost-saving tree-structured survival analysis for hip fracture of study of osteoporotic fractures data. ( 0,537657937756636 )
AMIA Annu Symp Proc - Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles. ( 0,537043036139769 )
Brief. Bioinformatics - Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data. ( 0,533260008663216 )
Brief. Bioinformatics - A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. ( 0,531196799707943 )
Comput Methods Programs Biomed - Algorithm for registration of full Scanning Laser Ophthalmoscope video sequences. ( 0,53054099697565 )
AMIA Annu Symp Proc - Patient clustering with uncoded text in electronic medical records. ( 0,528948139703777 )
Int J Comput Assist Radiol Surg - Fast lung nodule detection in chest CT images using cylindrical nodule-enhancement filter. ( 0,527573535246813 )
Brief. Bioinformatics - Data construction for phosphorylation site prediction. ( 0,526457510682605 )
IEEE J Biomed Health Inform - Red blood cell cluster separation from digital images for use in sickle cell disease. ( 0,525783895058242 )
Comput Math Methods Med - Feature selection for better identification of subtypes of Guillain-Barr? syndrome. ( 0,525553047457692 )
IEEE Trans Image Process - Dynamically removing false features in pyramidal lucas-kanade registration. ( 0,524101365668254 )
Comput Biol Chem - Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. ( 0,523521078165089 )
J. Comput. Biol. - A theoretical model for whole genome alignment. ( 0,518590445533562 )
Comput Math Methods Med - Na?ve Bayes classifier with feature selection to identify phage virion proteins. ( 0,515510498753442 )
IEEE Trans Image Process - In-plane rotation and scale invariant clustering using dictionaries. ( 0,515279509657792 )
Comput Biol Chem - Mode of action classification of chemicals using multi-concentration time-dependent cellular response profiles. ( 0,514941305576345 )
IEEE Trans Image Process - Robust through-the-wall radar image classification using a target-model alignment procedure. ( 0,514705313426775 )
Int J Comput Assist Radiol Surg - The NifTK software platform for image-guided interventions: platform overview and NiftyLink messaging. ( 0,514625209029116 )
Brief. Bioinformatics - Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles. ( 0,513484258554405 )
AMIA Annu Symp Proc - Approaching the limits of knowledge: the influence of priming on error detection in simulated clinical rounds. ( 0,512653699998658 )
Comput Math Methods Med - Novel harmonic regularization approach for variable selection in Cox's proportional hazards model. ( 0,512443096635292 )
J. Comput. Biol. - Biological cluster evaluation for gene function prediction. ( 0,511828873138578 )
Comput Math Methods Med - Investigation of attenuation correction for small-animal single photon emission computed tomography. ( 0,511485319024889 )
Comput. Biol. Med. - Multiple texture mapping of alveolar bone area for implant treatment in prosthetic dentistry. ( 0,511137403902361 )