J Biomed Inform - Practical approach to determine sample size for building logistic prediction models using high-throughput data.

Tópicos

{ sampl(1606) size(1419) use(1276) }
{ method(1219) similar(1157) match(930) }
{ assess(1506) score(1403) qualiti(1306) }
{ data(3008) multipl(1320) sourc(1022) }
{ structur(1116) can(940) graph(676) }
{ method(1557) propos(1049) approach(1037) }
{ model(2341) predict(2261) use(1141) }
{ cost(1906) reduc(1198) effect(832) }
{ control(1307) perform(991) simul(935) }
{ model(2656) set(1616) predict(1553) }
{ use(1733) differ(960) four(931) }
{ featur(3375) classif(2383) classifi(1994) }
{ studi(2440) review(1878) systemat(933) }
{ algorithm(1844) comput(1787) effici(935) }
{ model(3480) simul(1196) paramet(876) }
{ signal(2180) analysi(812) frequenc(800) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ motion(1329) object(1292) video(1091) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ howev(809) still(633) remain(590) }
{ perform(999) metric(946) measur(919) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ implement(1333) system(1263) develop(1122) }
{ take(945) account(800) differ(722) }
{ design(1359) user(1324) use(1319) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ treatment(1704) effect(941) patient(846) }
{ problem(2511) optim(1539) algorithm(950) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

An empirical method of sample size determination for building prediction models was proposed recently. Permutation method which is used in this procedure is a commonly used method to address the problem of overfitting during cross-validation while evaluating the performance of prediction models constructed from microarray data. But major drawback of such methods which include bootstrapping and full permutations is prohibitively high cost of computation required for calculating the sample size. In this paper, we propose that a single representative null distribution can be used instead of a full permutation by using both simulated and real data sets. During simulation, we have used a dataset with zero effect size and confirmed that the empirical type I error approaches to 0.05. Hence this method can be confidently applied to reduce overfitting problem during cross-validation. We have observed that pilot data set generated by random sampling from real data could be successfully used for sample size determination. We present our results using an experiment that was repeated for 300 times while producing results comparable to that of full permutation method. Since we eliminate full permutation, sample size estimation time is not a function of pilot data size. In our experiment we have observed that this process takes around 30min. With the increasing number of clinical studies, developing efficient sample size determination methods for building prediction models is critical. But empirical methods using bootstrap and permutation usually involve high computing costs. In this study, we propose a method that can reduce required computing time drastically by using representative null distribution of permutations. We use data from pilot experiments to apply this method for designing clinical studies efficiently for high throughput data.

Resumo Limpo

empir method sampl size determin build predict model propos recent permut method use procedur common use method address problem overfit crossvalid evalu perform predict model construct microarray data major drawback method includ bootstrap full permut prohibit high cost comput requir calcul sampl size paper propos singl repres null distribut can use instead full permut use simul real data set simul use dataset zero effect size confirm empir type error approach henc method can confid appli reduc overfit problem crossvalid observ pilot data set generat random sampl real data success use sampl size determin present result use experi repeat time produc result compar full permut method sinc elimin full permut sampl size estim time function pilot data size experi observ process take around min increas number clinic studi develop effici sampl size determin method build predict model critic empir method use bootstrap permut usual involv high comput cost studi propos method can reduc requir comput time drastic use repres null distribut permut use data pilot experi appli method design clinic studi effici high throughput data

Resumos Similares

J Chem Inf Model - Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing. ( 0,57168814778038 )
Neural Comput - Mapping of visual receptive fields by tomographic reconstruction. ( 0,561758975374386 )
Spat Spatiotemporal Epidemiol - Estimation of district-level under-5 mortality in Zambia using birth history data, 1980-2010. ( 0,536468496700067 )
Comput Math Methods Med - The number of candidate variants in exome sequencing for Mendelian disease under no genetic heterogeneity. ( 0,535474358651553 )
Artif Intell Med - An automated methodology for levodopa-induced dyskinesia: assessment based on gyroscope and accelerometer signals. ( 0,528590360512448 )
AMIA Annu Symp Proc - Expressing observations from electronic medical record flowsheets in an i2b2 based clinical data repository to support research and quality improvement. ( 0,527927837209581 )
Lifetime Data Anal - Goodness-of-fit tests for additive mean residual life model under right censoring. ( 0,524225097845946 )
Comput Math Methods Med - A note regarding problems with interaction and varying block sizes in a comparison of endotracheal tubes. ( 0,5222493771291 )
Int J Comput Assist Radiol Surg - Liver tumors segmentation from CTA images using voxels classification and affinity constraint propagation. ( 0,519953228872335 )
AMIA Annu Symp Proc - Managing Medical Vocabulary Updates in a Clinical Data Warehouse: An RxNorm Case Study. ( 0,516932564678308 )
Telemed J E Health - Measuring the effect of telecare on medical expenditures without bias using the propensity score matching method. ( 0,513677858356978 )
J Chem Inf Model - Template CoMFA: the 3D-QSAR Grail? ( 0,510662497407006 )
J. Comput. Biol. - PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. ( 0,504866233623577 )
IEEE Trans Pattern Anal Mach Intell - Free Energy Score Spaces: Using Generative Information in Discriminative Classifiers. ( 0,503737892497451 )
Comput Methods Programs Biomed - A bootstrap approach for lower injury levels of the risk curves. ( 0,503733472452461 )
Med Decis Making - Predictive Modeling of Implantation Outcome in an In Vitro Fertilization Setting: An Application of Machine Learning Methods. ( 0,501026473420692 )
IEEE Trans Neural Netw Learn Syst - Large-scale Nystr?m kernel matrix approximation using randomized SVD. ( 0,500648430944965 )
Brief. Bioinformatics - The impact of taxon sampling on phylogenetic inference: a review of two decades of controversy. ( 0,500526231641746 )
IEEE Trans Image Process - Correlation-coefficient-based fast template matching through partial elimination. ( 0,499732430736079 )
Comput Methods Programs Biomed - Computer simulation of the activity of the elderly person living independently in a Health Smart Home. ( 0,497369604113589 )
AMIA Annu Symp Proc - Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning. ( 0,49686604080801 )
J Biomed Inform - Limestone: high-throughput candidate phenotype generation via tensor factorization. ( 0,495672815290779 )
J Biomed Inform - Sample size estimation in diagnostic test studies of biomedical informatics. ( 0,49103319793672 )
Comput Methods Programs Biomed - On the development of a computer-based handwriting assessment tool to objectively quantify handwriting proficiency in children. ( 0,489392215588369 )
BMC Med Inform Decis Mak - Predicting sample size required for classification performance. ( 0,488137569812772 )
BMC Med Inform Decis Mak - A straightforward approach to designing a scoring system for predicting length-of-stay of cardiac surgery patients. ( 0,486425973032344 )
J Med Syst - The effect of socio-cultural characteristics on the effectiveness of teamwork: a study in the G?lhane Military Medical Faculty Training Hospital. ( 0,485279568399589 )
Appl Clin Inform - The false security of blind dates: chrononymization's lack of impact on data privacy of laboratory data. ( 0,48517904938452 )
J Chem Inf Model - Build-up algorithm for atomic correspondence between chemical structures. ( 0,485065037330302 )
Res Synth Methods - Trial sequential methods for meta-analysis. ( 0,480445467538878 )
Methods Inf Med - A simplification and implementation of random-effects meta-analyses based on the exact distribution of Cochran's Q. ( 0,477524510828205 )
J. Med. Internet Res. - Does self-selection affect samples' representativeness in online surveys? An investigation in online video game research. ( 0,476254450936155 )
J. Med. Internet Res. - Internet Addiction Test (IAT): which is the best factorial solution? ( 0,476083284383566 )
Brief. Bioinformatics - Mutational analysis in RNAs: comparing programs for RNA deleterious mutation prediction. ( 0,475198269264381 )
IEEE Trans Image Process - Retina verification system based on biometric graph matching. ( 0,474994016683148 )
AMIA Annu Symp Proc - Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies. ( 0,474158704180634 )
Comput. Biol. Med. - Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. ( 0,473674512328833 )
Int J Comput Assist Radiol Surg - Full-field digital mammography image data storage reduction using a crop tool. ( 0,472439204991457 )
BMC Med Inform Decis Mak - Fast PCA for processing calcium-imaging data from the brain of Drosophila melanogaster. ( 0,471999624338011 )
IEEE Trans Pattern Anal Mach Intell - Simplified Computation for Nonparametric Windows Method of Probability Density Function Estimation. ( 0,470597232552891 )
Brief. Bioinformatics - Performance evaluation of DNA copy number segmentation methods. ( 0,468246455338044 )
Med Decis Making - Hospital variation in patient-reported outcomes at the level of EQ-5D dimensions: evidence from England. ( 0,466691979724537 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,466318033040899 )
IEEE Trans Image Process - Monotonic regression: a new way for correlating subjective and objective ratings in image quality research. ( 0,465401583807238 )
Med Biol Eng Comput - Peculiarities of extracellular potentials produced by deep muscles. Part 2: motor unit potentials. ( 0,465225969115913 )
IEEE Trans Neural Netw Learn Syst - A new method for data stream mining based on the misclassification error. ( 0,464187397047713 )
J Chem Inf Model - Conformer generation with OMEGA: learning from the data set and the analysis of failures. ( 0,463239426794548 )
J Chem Inf Model - Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. ( 0,463091832752064 )
Neural Comput - A connection between score matching and denoising autoencoders. ( 0,461791919654541 )
J Am Med Inform Assoc - Calibrating predictive model estimates to support personalized medicine. ( 0,461269726017646 )
Neural Comput - Likelihood methods for point processes with refractoriness. ( 0,461109966582552 )
J Biomed Inform - Feature-expression heat maps--a new visual method to explore complex associations between two variable sets. ( 0,460559725785372 )
J Am Med Inform Assoc - A study in transfer learning: leveraging data from multiple hospitals to enhance hospital-specific predictions. ( 0,459297179154679 )
IEEE Trans Pattern Anal Mach Intell - Matching by Tone Mapping: Photometric Invariant Template Matching. ( 0,458283466113766 )
IEEE Trans Image Process - Histogram contextualization. ( 0,458205723949907 )
Comput Methods Programs Biomed - Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. ( 0,457194027088561 )
Int J Neural Syst - Retrieval of noisy fingerprint patterns using metric attractor networks. ( 0,456002889243459 )
IEEE J Biomed Health Inform - Component-Level Tuning of Kinematic Features from Composite Therapist Impressions of Movement Quality. ( 0,45598705553491 )
Med Decis Making - Mapping QLQ-C30, HAQ, and MSIS-29 on EQ-5D. ( 0,455785626233077 )
J Am Med Inform Assoc - Usability-driven pruning of large ontologies: the case of SNOMED CT. ( 0,454499390027748 )
IEEE Trans Pattern Anal Mach Intell - Trinary-Projection Trees for Approximate Nearest Neighbor Search. ( 0,451523592062615 )
J Am Med Inform Assoc - A multi-part matching strategy for mapping LOINC with laboratory terminologies. ( 0,451143284165093 )
IEEE Trans Pattern Anal Mach Intell - Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics. ( 0,450353779495096 )
J. Comput. Biol. - A polynomial-time algorithm computing lower and upper bounds of the rooted subtree prune and regraft distance. ( 0,450127209032609 )
J Chem Inf Model - Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models. ( 0,449571781616348 )
IEEE Trans Vis Comput Graph - GosperMap: Using a Gosper Curve for Laying out Hierarchical Data. ( 0,449515660757958 )
J. Comput. Biol. - The approximability of shortest path-based graph orientations of protein-protein interaction networks. ( 0,448232816113016 )
Med Decis Making - The use of rasch analysis in reducing a large condition-specific instrument for preference valuation: the case of moving from AQLQ to AQL-5D. ( 0,448041862552231 )
J. Med. Internet Res. - Applying computer adaptive testing to optimize online assessment of suicidal behavior: a simulation study. ( 0,447638000934068 )
Neural Comput - On criticality in high-dimensional data. ( 0,447301859318442 )
J Chem Inf Model - Determination of toxicant mode of action by augmented top priority fragment class. ( 0,447156623195744 )
J Chem Inf Model - Mapping monomeric threading to protein-protein structure prediction. ( 0,445770984527915 )
IEEE Trans Image Process - Paramer mismatch-based spectral gamut mapping. ( 0,444875465933429 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,444112100145319 )
Methods Inf Med - Blinded sample size reestimation with negative binomial counts in superiority and non-inferiority trials. ( 0,443789138599141 )
Comput. Biol. Med. - 3D surface roughness measurement for scaliness scoring of psoriasis lesions. ( 0,44213378079973 )
J. Med. Internet Res. - Applying computerized adaptive testing to the Negative Acts Questionnaire-Revised: Rasch analysis of workplace bullying. ( 0,442091302600344 )
IEEE Trans Image Process - Discretization of parametrizable signal manifolds. ( 0,44167413803797 )
IEEE Trans Vis Comput Graph - SuperMatching: Feature Matching Using Supersymmetric Geometric Constraints. ( 0,441434585553578 )
Methods Inf Med - Limited sampling strategies to estimate the area under the concentration-time curve. Biases and a proposed more accurate method. ( 0,441294548467801 )
Comput Math Methods Med - Error analysis of deep sequencing of phage libraries: peptides censored in sequencing. ( 0,441014329110211 )
Comput Math Methods Med - Three-dimensional identification of microorganisms using a digital holographic microscope. ( 0,440101071852008 )
BMC Med Inform Decis Mak - Measuring decision quality: psychometric evaluation of a new instrument for breast cancer surgery. ( 0,438646033247508 )
Med Decis Making - Test result-based sampling: an efficient design for estimating the accuracy of patient safety indicators. ( 0,438560188658372 )
J. Comput. Biol. - Balancing the robustness and predictive performance of biomarkers. ( 0,438303500495293 )
Med Biol Eng Comput - Remote physiological and GPS data processing in evaluation of physical activities. ( 0,438042696576354 )
J. Comput. Biol. - Shape-based feature matching improves protein identification via LC-MS and tandem MS. ( 0,438012612022155 )
Res Synth Methods - Methodological quality of meta-analyses: matched-pairs comparison over time and between industry-sponsored and academic-sponsored reports. ( 0,437865813047556 )
J. Comput. Biol. - Exactly computing the parsimony scores on phylogenetic networks using dynamic programming. ( 0,437811697498677 )
BMC Med Inform Decis Mak - Decision-making in healthcare: a practical application of partial least square path modelling to coverage of newborn screening programmes. ( 0,437406082888256 )
Telemed J E Health - Improving the communication reliability of body sensor networks based on the IEEE 802.15.4 protocol. ( 0,434649601693914 )
J. Comput. Biol. - Efficient error-correcting pooling designs constructed from pseudo-symplectic spaces over a finite field. ( 0,434427687370188 )
Brief. Bioinformatics - A comparative analysis of biclustering algorithms for gene expression data. ( 0,434307360521735 )
Med Biol Eng Comput - Size matters: MEG empirical and simulation study on source localization of the earliest visual activity in the occipital cortex. ( 0,434133323505663 )
Appl Clin Inform - What big size you have! Using effect sizes to determine the impact of public health nursing interventions. ( 0,434081043669765 )
IEEE Trans Image Process - Image quality assessment using multi-method fusion. ( 0,434048289317624 )
IEEE Trans Neural Netw Learn Syst - A Distributed Approach Toward Discriminative Distance Metric Learning. ( 0,434025169320956 )
J Am Med Inform Assoc - Reliability and validity of the American Hospital Association's national longitudinal survey of health information technology adoption. ( 0,43366580540513 )
Med Decis Making - Predicting the EuroQol Group's EQ-5D index from CDC's Healthy Days in a US sample. ( 0,432941214904949 )
Int J Med Inform - Content analysis of physical examination templates in electronic health records using SNOMED CT. ( 0,431593137414401 )