Brief. Bioinformatics - Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies.


{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ survey(1388) particip(1329) question(1065) }
{ take(945) account(800) differ(722) }
{ state(1844) use(1261) util(961) }
{ risk(3053) factor(974) diseas(938) }
{ perform(1367) use(1326) method(1137) }
{ age(1611) year(1155) adult(843) }
{ first(2504) two(1366) second(1323) }
{ method(1219) similar(1157) match(930) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ method(1557) propos(1049) approach(1037) }
{ design(1359) user(1324) use(1319) }
{ sampl(1606) size(1419) use(1276) }
{ process(1125) use(805) approach(778) }
{ can(774) often(719) complex(702) }
{ assess(1506) score(1403) qualiti(1306) }
{ learn(2355) train(1041) set(1003) }
{ search(2224) databas(1162) retriev(909) }
{ research(1218) medic(880) student(794) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ time(1939) patient(1703) rate(768) }
{ use(976) code(926) identifi(902) }
{ decis(3086) make(1611) patient(1517) }
{ model(3404) distribut(989) bayesian(671) }
{ inform(2794) health(2639) internet(1427) }
{ imag(2830) propos(1344) filter(1198) }
{ concept(1167) ontolog(924) domain(897) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ howev(809) still(633) remain(590) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ model(3480) simul(1196) paramet(876) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ use(2086) technolog(871) perceiv(783) }
{ estim(2440) model(1874) function(577) }
{ method(1969) cluster(1462) data(1082) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ data(3008) multipl(1320) sourc(1022) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }


It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.

Resumo Limpo

common advis practic biomed research valid experiment observ find popul differ one find initi assess practic increas generaliz result decreas likelihood report falseposit find valid becom critic deal highthroughput experi larg number test increas chanc observ falseposit result articl review common approach determin statist threshold valid describ factor influenc proport signific find train sampl replic valid sampl refer proport rediscoveri rate rdr highthroughput studi rdr function falseposit rate power train valid sampl illustr applic rdr use simul data real data exampl metabolom experi describ onlin tool calcul rdr use tstatist forese two main applic first valid studi yet collect rdr can use decid optim combin proport find taken valid size valid studi second valid studi alreadi done rdr estim use train data can compar observ rdr valid data henc success valid studi can assess

Resumos Similares

Spat Spatiotemporal Epidemiol - Spatial modelling of disease using data- and knowledge-driven approaches. ( 0,706326203707148 )
AMIA Annu Symp Proc - Advanced proficiency EHR training: effect on physicians' EHR efficiency, EHR satisfaction and job satisfaction. ( 0,627899684759751 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,623222063821722 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,618128425596302 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,617211921501771 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,617036529786916 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,614370639020726 )
Int J Health Geogr - Estimating the geographic distribution of human Tanapox and potential reservoirs using ecological niche modeling. ( 0,611029038943433 )
Methods Inf Med - Personal adaptive method to assess mental tension during daily life using heart rate variability. ( 0,603233159254751 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,595408832237569 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,593778459036643 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,589945626350153 )
Comput Methods Programs Biomed - A predictive model of longitudinal, patient-specific colonoscopy results. ( 0,587089851726015 )
J Chem Inf Model - Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. ( 0,586989236284733 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,586936344434561 )
Lifetime Data Anal - Analysis of cure rate survival data under proportional odds model. ( 0,586835732410282 )
IEEE Trans Image Process - Incremental N-mode SVD for large-scale multilinear generative models. ( 0,583054081222075 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,579230207254918 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,579057788202754 )
J. Med. Internet Res. - A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. ( 0,577127042891101 )
J. Comput. Biol. - An almost optimal algorithm for generalized threshold group testing with inhibitors. ( 0,575234900574599 )
Lifetime Data Anal - Bayesian inference of the fully specified subdistribution model for survival data with competing risks. ( 0,572558779684995 )
J Chem Inf Model - Rank order entropy: why one metric is not enough. ( 0,571718878934287 )
Artif Intell Med - Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis. ( 0,568058769416611 )
Methods Inf Med - Towards a personalized and dynamic CRT-D. A computational cardiovascular model dedicated to therapy optimization. ( 0,565402464673538 )
Comput Methods Programs Biomed - Predicting body fat percentage based on gender, age and BMI by using artificial neural networks. ( 0,564168134933305 )
Med Decis Making - Developing a tuberculosis transmission model that accounts for changes in population health. ( 0,563063846154243 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,563033623253924 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,560868893171161 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,558465011119262 )
Int J Health Geogr - A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes. ( 0,557286217017975 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,555323946232959 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,552854685000287 )
AMIA Annu Symp Proc - Identifying Deviations from Usual Medical Care using a Statistical Approach. ( 0,550430824557917 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,549561387477906 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,54811851641346 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,547696161566949 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,546589000927955 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,545103963331246 )
Med Decis Making - Prediction of health preference values from CD4 counts in individuals with HIV. ( 0,545068899418652 )
J Chem Inf Model - Estimation of carcinogenicity using molecular fragments tree. ( 0,543986522612968 )
BMC Med Inform Decis Mak - Statistical process control for data without inherent order. ( 0,539935925710951 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,53862217638715 )
J. Med. Internet Res. - Guess who's not coming to dinner? Evaluating online restaurant reservations for disease surveillance. ( 0,533513917220515 )
Med Biol Eng Comput - Optimal design of clinical tests for the identification of physiological models of type 1 diabetes in the presence of model mismatch. ( 0,532595138089941 )
BMC Med Inform Decis Mak - Is it possible to identify cases of coronary artery bypass graft postoperative surgical site infection accurately from claims data? ( 0,531165600189534 )
AMIA Annu Symp Proc - Coverage of rare disease names in standard terminologies and implications for patients, providers, and research. ( 0,530619239552388 )
Int J Med Inform - Reporting systems, reporting rates and completeness of data reported from primary healthcare to a Swedish quality register--the National Diabetes Register. ( 0,527071538260634 )
Med Decis Making - Predicting EQ-5D utility scores from the Seattle Angina Questionnaire in coronary artery disease: a mapping algorithm using a Bayesian framework. ( 0,526840932076863 )
Neural Comput - Hidden Markov models for the stimulus-response relationships of multistate neural systems. ( 0,526315973139499 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,524695080725904 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,522034022844905 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,521489673355138 )
J Am Med Inform Assoc - Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. ( 0,521435901889229 )
Comput. Biol. Med. - Artificial neural network modelling of the results of tympanoplasty in chronic suppurative otitis media patients. ( 0,52025491434949 )
J Biomed Inform - Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. ( 0,516882935512485 )
Comput. Biol. Med. - Prediction of metabolic syndrome using artificial neural network system based on clinical data including insulin resistance index and serum adiponectin. ( 0,516458330089204 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,513768402287544 )
Brief. Bioinformatics - A unifying framework for bivalent multilocus linkage analysis of allotetraploids. ( 0,513239920345171 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,512021744740151 )
Comput Math Methods Med - Locomotor development prediction based on statistical model parameters identification. ( 0,511260211723174 )
Int J Health Geogr - A validation of ground ambulance pre-hospital times modeled using geographic information systems. ( 0,509314020894015 )
AMIA Annu Symp Proc - Ontology-based federated data access to human studies information. ( 0,50843499060786 )
BMC Med Inform Decis Mak - Prediction of gastrointestinal disease with over-the-counter diarrheal remedy sales records in the San Francisco Bay Area. ( 0,507675025155715 )
J Biomed Inform - Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: an application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39). ( 0,507056072041684 )
BMC Med Inform Decis Mak - A hybrid seasonal prediction model for tuberculosis incidence in China. ( 0,505928090052033 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,505776836772315 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,505640154648372 )
Comput Methods Programs Biomed - A new surveillance and spatio-temporal visualization tool SIMID: SIMulation of infectious diseases using random networks and GIS. ( 0,505246416531096 )
J. Med. Internet Res. - Preparing facilitators from community-based organizations for evidence-based intervention training in Second Life. ( 0,504746588330002 )
IEEE J Biomed Health Inform - Tensor-based methods for handling missing data in quality-of-life questionnaires. ( 0,503047079663544 )
J Chem Inf Model - Conformer generation with OMEGA: learning from the data set and the analysis of failures. ( 0,502894044570001 )
Int J Health Geogr - MosquitoMap and the Mal-area calculator: new web tools to relate mosquito species distribution with vector borne disease. ( 0,502525710465424 )
Comput Math Methods Med - Multiscale autoregressive identification of neuroelectrophysiological systems. ( 0,502158986697313 )
Int J Health Geogr - Modelling zoonotic diseases in humans: comparison of methods for hantavirus in Sweden. ( 0,499181795550431 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,497848699831282 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,497729447163041 )
J Chem Inf Model - Template CoMFA: the 3D-QSAR Grail? ( 0,497555563464796 )
Comput Math Methods Med - Time-course analysis of main markers of primary infection in cats with the feline immunodeficiency virus. ( 0,497039424113919 )
Int J Health Geogr - Comparative analysis of remotely-sensed data products via ecological niche modeling of avian influenza case occurrences in Middle Eastern poultry. ( 0,496286057963204 )
J Chem Inf Model - A multiscale simulation system for the prediction of drug-induced cardiotoxicity. ( 0,496209183906321 )
Int J Health Geogr - Using Google Street View for systematic observation of the built environment: analysis of spatio-temporal instability of imagery dates. ( 0,495401588789514 )
J. Med. Internet Res. - Use of a text message-based pharmacovigilance tool in Cambodia: pilot study. ( 0,492935239334144 )
J Med Syst - Utilization of electronic medical records to build a detection model for surveillance of healthcare-associated urinary tract infections. ( 0,492563677947872 )
IEEE Trans Image Process - A Unified Methodology for Computing Accurate Quaternion Color Moments and Moment Invariants. ( 0,492059789361135 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,491894845594539 )
J Chem Inf Model - Design of novel FLT-3 inhibitors based on dual-layer 3D-QSAR model and fragment-based compounds in silico. ( 0,491559434344979 )
J. Med. Internet Res. - Outsourcing medical data analyses: can technology overcome legal, privacy, and confidentiality issues? ( 0,490612374808656 )
Appl Clin Inform - Information needs for the OR and PACU electronic medical record. ( 0,489997670974215 )
J Chem Inf Model - Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation. ( 0,486547997016083 )
J Biomed Inform - Improving record linkage performance in the presence of missing linkage data. ( 0,486047192145762 )
BMC Med Inform Decis Mak - Developing model-based algorithms to identify screening colonoscopies using administrative health databases. ( 0,485609165064073 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,485487961945385 )
IEEE J Biomed Health Inform - Early Index for Detection of Pediatric Emergency Department Crowding. ( 0,485279425153317 )
Comput. Aided Surg. - Evaluation of a computational model to predict elbow range of motion. ( 0,48478423352267 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,484538851482651 )
BMC Med Inform Decis Mak - A simulation model of colorectal cancer surveillance and recurrence. ( 0,484363376613282 )
Curr Comput Aided Drug Des - QSAR Models for the Reactivation of Sarin Inhibited AChE by Quaternary Pyridinium Oximes Based on Monte Carlo Method. ( 0,483700664437057 )
Med Decis Making - Survival analysis and extrapolation modeling of time-to-event clinical trial data for economic evaluation: an alternative approach. ( 0,483359037054194 )
BMC Med Inform Decis Mak - Establishing a web-based integrated surveillance system for early detection of infectious disease epidemic in rural China: a field experimental study. ( 0,482538400760564 )