J Chem Inf Model - Determining the degree of randomness of descriptors in linear regression equations with respect to the data size.

Tópicos

{ measur(2081) correl(1212) valu(896) }
{ sampl(1606) size(1419) use(1276) }
{ can(981) present(881) function(850) }
{ estim(2440) model(1874) function(577) }
{ method(1219) similar(1157) match(930) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1119) effect(1106) posit(819) }
{ can(774) often(719) complex(702) }
{ studi(2440) review(1878) systemat(933) }
{ state(1844) use(1261) util(961) }
{ model(2656) set(1616) predict(1553) }
{ detect(2391) sensit(1101) algorithm(908) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ learn(2355) train(1041) set(1003) }
{ algorithm(1844) comput(1787) effici(935) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ howev(809) still(633) remain(590) }
{ risk(3053) factor(974) diseas(938) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ model(3480) simul(1196) paramet(876) }
{ cost(1906) reduc(1198) effect(832) }
{ high(1669) rate(1365) level(1280) }
{ survey(1388) particip(1329) question(1065) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

Linear regression equations suffer from the curse of dimensionality that leads to overfitting and accidental correlation, particularly for small data sets and when many variables are present. This can lead to cases where descriptors based on random numbers exhibit higher correlations than actual descriptors. In this study, it was therefore investigated how high the degree of accidental correlation of a single descriptor can be with respect to the number of observations. On the basis of computer simulations for data sizes ranging from 7 to 500 observations, a formula was derived that expresses the degree of randomness (in percent) of a chosen descriptor depending on its correlation coefficient and the size of the data set. This allows one to determine a cutoff for the correlation below which descriptors can be discarded due to a high risk of chance correlation. Doing so, the number of eligible variables for the regression analysis can be reduced substantially. Corresponding applications are reported for several QSAR data sets of various sizes.

Resumo Limpo

linear regress equat suffer curs dimension lead overfit accident correl particular small data set mani variabl present can lead case descriptor base random number exhibit higher correl actual descriptor studi therefor investig high degre accident correl singl descriptor can respect number observ basi comput simul data size rang observ formula deriv express degre random percent chosen descriptor depend correl coeffici size data set allow one determin cutoff correl descriptor can discard due high risk chanc correl number elig variabl regress analysi can reduc substanti correspond applic report sever qsar data set various size

Resumos Similares

Res Synth Methods - Inaccuracy of regression results in replacing bivariate correlations. ( 0,803504994183657 )
Res Synth Methods - Meta-regression approximations to reduce publication selection bias. ( 0,668947126260182 )
Med Decis Making - Strategies for efficient computation of the expected value of partial perfect information. ( 0,619303002246625 )
Res Synth Methods - The problem of natural funnel asymmetries: a simulation analysis of meta-analysis in macroeconomics. ( 0,613994681315066 )
Comput Methods Programs Biomed - A bootstrap approach for lower injury levels of the risk curves. ( 0,602674020326634 )
Methods Inf Med - Chronological bias in randomized clinical trials arising from different types of unobserved time trends. ( 0,597830670202765 )
J Clin Monit Comput - Masseter muscle oxygen saturation is associated with central venous oxygen saturation in patients with severe sepsis. ( 0,584815640698112 )
Comput Methods Programs Biomed - Designing group sequential randomized clinical trials with time to event end points using a R function. ( 0,566266213663496 )
Neural Comput - Nonparametric estimation of K?llback-Leibler divergence. ( 0,565143432885777 )
Comput. Biol. Med. - Approach for streamlining measurement of complex physiological phenotypes of upper airway collapsibility. ( 0,564254667206703 )
Methods Inf Med - Assessment of pain expression in infant cry signals using empirical mode decomposition. ( 0,559742902660753 )
Lifetime Data Anal - Events per variable for risk differences and relative risks using pseudo-observations. ( 0,547161686195924 )
IEEE Trans Pattern Anal Mach Intell - Accuracy of Pseudo-Inverse Covariance Learning-A Random Matrix Theory Analysis. ( 0,53896089764054 )
Methods Inf Med - Influence of selection bias on the test decision. A simulation study. ( 0,537956767876926 )
Res Synth Methods - Trial sequential methods for meta-analysis. ( 0,530811548202745 )
Comput Biol Chem - Effective sample size: Quick estimation of the effect of related samples in genetic case-control association analyses. ( 0,527948412939763 )
Neural Comput - Reliability of information-based integration of EEG and fMRI data: a simulation study. ( 0,52756151523177 )
Comput Methods Programs Biomed - Poisson regression models outperform the geometrical model in estimating the peak-to-trough ratio of seasonal variation: a simulation study. ( 0,527042631285219 )
Comput Math Methods Med - Power analysis of C-TDT for small sample size genome-wide association studies by the joint use of case-parent trios and pairs. ( 0,52631429663999 )
Neural Comput - Impact of correlated neural activity on decision-making performance. ( 0,522418004494075 )
Med Decis Making - Common scale valuations across different preference-based measures: estimation using rank data. ( 0,519387601406522 )
Methods Inf Med - Sample size reassessment in non-inferiority trials. Internal pilot study designs with ANCOVA. ( 0,517403842151892 )
J Clin Monit Comput - The importance of using the correct bounds on the Bland-Altman limits of agreement when multiple measurements are recorded per patient. ( 0,516794395405018 )
J Med Syst - The effect of socio-cultural characteristics on the effectiveness of teamwork: a study in the G?lhane Military Medical Faculty Training Hospital. ( 0,512871189949549 )
Comput. Biol. Med. - A comparison of multivariate causality based measures of effective connectivity. ( 0,510996697254147 )
Med Decis Making - Predicting the EuroQol Group's EQ-5D index from CDC's Healthy Days in a US sample. ( 0,510841127249357 )
Comput Math Methods Med - A new ratio for protocol categorization. ( 0,508689654139461 )
Res Synth Methods - Robust variance estimation in meta-regression with binary dependent effects. ( 0,507691958483083 )
J Chem Inf Model - Sampling multiple scoring functions can improve protein loop structure prediction accuracy. ( 0,507439892025094 )
Methods Inf Med - Measuring inter-observer agreement in contour delineation of medical imaging in a dummy run using Fleiss' kappa. ( 0,506108045727578 )
IEEE Trans Vis Comput Graph - Filtering Non-Linear Transfer Functions on Surfaces. ( 0,502432705012943 )
J Clin Monit Comput - The ability of the Vigileo-FloTrac system to measure cardiac output and track cardiac output changes during one-lung ventilation. ( 0,502297482160238 )
Comput. Biol. Med. - Gait variability and stability measures: minimum number of strides and within-session reliability. ( 0,501187444901016 )
Res Synth Methods - Semiparametric hazard function estimation in meta-analysis for time to event data. ( 0,499658908141634 )
Comput Methods Programs Biomed - Extracting more information from EEG recordings for a better description of sleep. ( 0,497990461615377 )
IEEE Trans Image Process - Balanced multiwavelets with interpolatory property. ( 0,497627698297281 )
Perspect Health Inf Manag - Projected impact of the ICD-10-CM/PCS conversion on longitudinal data and the Joint Commission Core Measures. ( 0,497052701359811 )
Med Biol Eng Comput - Novel parameters indicate significant differences in severity of obstructive sleep apnea with patients having similar apnea-hypopnea index. ( 0,495840577166013 )
J Med Syst - Throughput and delay analysis of IEEE 802.15.6-based CSMA/CA protocol. ( 0,495759709123369 )
Med Decis Making - Development of an EORTC-8D utility algorithm for Sri Lanka. ( 0,494249413794229 )
Comput Methods Programs Biomed - Estimation of the concordance correlation coefficient for repeated measures using SAS and R. ( 0,493363194057274 )
Res Synth Methods - Robust variance estimation in meta-regression with dependent effect size estimates. ( 0,492387600049228 )
Med Decis Making - Valuing SF-6D Health States Using a Discrete Choice Experiment. ( 0,49079097859414 )
Brief. Bioinformatics - A comparative study of statistical methods used to identify dependencies between gene expression signals. ( 0,490325376754099 )
Med Biol Eng Comput - Dynamic cerebral autoregulation: different signal processing methods without influence on results and reproducibility. ( 0,488326891204702 )
IEEE Trans Image Process - Random phase textures: theory and synthesis. ( 0,488323025268177 )
J. Med. Internet Res. - Estimation of physical activity levels using cell phone questionnaires: a comparison with accelerometry for evaluation of between-subject and within-subject variations. ( 0,48815814404128 )
Neural Comput - Mapping of visual receptive fields by tomographic reconstruction. ( 0,487243527029029 )
J Chem Inf Model - Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. ( 0,486008715760706 )
J Biomed Inform - PRIM versus CART in subgroup discovery: when patience is harmful. ( 0,484817863044837 )
Med Decis Making - Standard error of measurement of 5 health utility indexes across the range of health for use in estimating reliability and responsiveness. ( 0,484392034018507 )
Lifetime Data Anal - Regression analysis of informative current status data with the additive hazards model. ( 0,484387772129173 )
Lifetime Data Anal - Competing risks with missing covariates: effect of haplotypematch on hematopoietic cell transplant patients. ( 0,483981600179651 )
Int J Health Geogr - Do measures matter? Comparing surface-density-derived and census-tract-derived measures of racial residential segregation. ( 0,483862919722095 )
Methods Inf Med - Blinded sample size reestimation with negative binomial counts in superiority and non-inferiority trials. ( 0,483428384804804 )
Int J Neural Syst - Behind the magical numbers: hierarchical chunking and the human working memory capacity. ( 0,482846409436762 )
Lifetime Data Anal - On collapsibility and confounding bias in Cox and Aalen regression models. ( 0,482561291957258 )
Med Decis Making - Test result-based sampling: an efficient design for estimating the accuracy of patient safety indicators. ( 0,481333416563531 )
BMC Med Inform Decis Mak - Predicting sample size required for classification performance. ( 0,479924608138329 )
Lifetime Data Anal - A competing risks model for correlated data based on the subdistribution hazard. ( 0,479510821751697 )
J Biomed Inform - Sample size estimation in diagnostic test studies of biomedical informatics. ( 0,479466479759449 )
J Clin Monit Comput - Non-invasive measurement of cardiac output in obese children and adolescents: comparison of electrical cardiometry and transthoracic Doppler echocardiography. ( 0,478225869633539 )
Neural Comput - Least-squares independent component analysis. ( 0,475545782075368 )
Comput Methods Programs Biomed - Automated segmentation of optic disc region on retinal fundus photographs: Comparison of contour modeling and pixel classification methods. ( 0,47527224701824 )
Lifetime Data Anal - Likelihood ratio procedures and tests of fit in parametric and semiparametric copula models with censored data. ( 0,475139217203488 )
Med Biol Eng Comput - Intra-protocol repeatability and inter-protocol agreement for the analysis of scapulo-humeral coordination. ( 0,474959694275323 )
Comput. Biol. Med. - The precision of resting blood pressure measurement. ( 0,474816049265425 )
J. Comput. Biol. - Shape-based feature matching improves protein identification via LC-MS and tandem MS. ( 0,474679873970601 )
Neural Comput - Likelihood methods for point processes with refractoriness. ( 0,474309305981891 )
IEEE Trans Image Process - Generalizing the majority voting scheme to spatially constrained voting. ( 0,472617092105313 )
Res Synth Methods - The impact of multiple endpoint dependency on Q and I(2) in meta-analysis. ( 0,472251033681096 )
IEEE Trans Vis Comput Graph - Facial Performance Transfer via Deformable Models and Parametric Correspondence. ( 0,470825959164171 )
Lifetime Data Anal - Checking Fine and Gray subdistribution hazards model with cumulative sums of residuals. ( 0,470200312663193 )
Neural Comput - Least squares estimation without priors or supervision. ( 0,469996743895282 )
J Clin Monit Comput - Comparison of SNAP? II and BIS Vista indices during normothermic cardiopulmonary bypass under isoflurane anesthesia. ( 0,469117432577618 )
J Am Med Inform Assoc - Usability-driven pruning of large ontologies: the case of SNOMED CT. ( 0,467612131638852 )
J Biomed Inform - Clustering clinical trials with similar eligibility criteria features. ( 0,464363434823034 )
IEEE Trans Image Process - Characterization of electrophotographic print artifacts: banding, jitter, and ghosting. ( 0,463681564824956 )
Lifetime Data Anal - Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model. ( 0,463362870634548 )
Med Decis Making - Mapping a patient-reported functional outcome measure to a utility measure for comparative effectiveness and economic evaluations in older adults with low back pain. ( 0,461499009851696 )
Brief. Bioinformatics - On the validity of time-dependent AUC estimators. ( 0,461425877340475 )
Lifetime Data Anal - Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials. ( 0,461201439904278 )
Med Decis Making - US valuation of the SF-6D. ( 0,461152233473326 )
J. Med. Internet Res. - Does self-selection affect samples' representativeness in online surveys? An investigation in online video game research. ( 0,460609376442295 )
J. Comput. Biol. - Improved harmonic mean estimator for phylogenetic model evidence. ( 0,46058468683488 )
Comput Methods Programs Biomed - Increased variation of the response index of nociception during noxious stimulation in patients during general anaesthesia. ( 0,460102256308921 )
Comput Methods Programs Biomed - Ultrasound IMT measurement on a multi-ethnic and multi-institutional database: our review and experience using four fully automated and one semi-automated methods. ( 0,45934729727957 )
Lifetime Data Anal - Missing genetic information in case-control family data with general semi-parametric shared frailty model. ( 0,459235836728188 )
IEEE Trans Image Process - Sampling optimization for printer characterization by direct search. ( 0,458571827703402 )
J Clin Monit Comput - The effect of desflurane versus propofol on regional cerebral oxygenation in the sitting position for shoulder arthroscopy. ( 0,458308139909937 )
Neural Comput - Distinguishing the causes of firing with the membrane potential slope. ( 0,458044955111883 )
IEEE Trans Image Process - Wavelet modeling using finite mixtures of generalized gaussian distributions: application to texture discrimination and retrieval. ( 0,457638346105767 )
Comput Math Methods Med - Methodological framework for estimating the correlation dimension in HRV signals. ( 0,456210027147483 )
J Clin Monit Comput - Peripheral tissue oximetry: comparing three commercial near-infrared spectroscopy oximeters on the forearm. ( 0,455681572434856 )
J Clin Monit Comput - Comparison of ear and chest probes in transcutaneous carbon dioxide pressure measurements during general anesthesia in adults. ( 0,455563019911968 )
IEEE Trans Image Process - Fast bilateral filter with arbitrary range and domain kernels. ( 0,455311662171115 )
IEEE Trans Pattern Anal Mach Intell - Maximum Likelihood Estimation of Depth Maps Using Photometric Stereo. ( 0,455305234694989 )
J Clin Monit Comput - Cardiac index measurements by transcutaneous Doppler ultrasound and transthoracic echocardiography in adult and pediatric emergency patients. ( 0,453414691389342 )
Res Synth Methods - Performance of a proportion-based approach to meta-analytic moderator estimation: results from Monte Carlo simulations. ( 0,453114663065018 )
Methods Inf Med - Evaluation of imbalance in stratified blocked randomization: some remarks on the range of validity of the model by Hallstrom and Davis. ( 0,452603990485219 )