Brief. Bioinformatics - Random forest Gini importance favours SNPs with large minor allele frequency: impact, sources and recommendations.

Tópicos

{ measur(2081) correl(1212) valu(896) }
{ studi(2440) review(1878) systemat(933) }
{ gene(2352) biolog(1181) express(1162) }
{ import(1318) role(1303) understand(862) }
{ method(1969) cluster(1462) data(1082) }
{ studi(1410) differ(1259) use(1210) }
{ analysi(2126) use(1163) compon(1037) }
{ use(976) code(926) identifi(902) }
{ clinic(1479) use(1117) guidelin(835) }
{ model(2341) predict(2261) use(1141) }
{ can(774) often(719) complex(702) }
{ group(2977) signific(1463) compar(1072) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ take(945) account(800) differ(722) }
{ search(2224) databas(1162) retriev(909) }
{ studi(1119) effect(1106) posit(819) }
{ state(1844) use(1261) util(961) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ result(1111) use(1088) new(759) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ network(2748) neural(1063) input(814) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ model(3480) simul(1196) paramet(876) }
{ intervent(3218) particip(2042) group(1664) }
{ use(2086) technolog(871) perceiv(783) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The use of random forests is increasingly common in genetic association studies. The variable importance measure (VIM) that is automatically calculated as a by-product of the algorithm is often used to rank polymorphisms with respect to their ability to predict the investigated phenotype. Here, we investigate a characteristic of this methodology that may be considered as an important pitfall, namely that common variants are systematically favoured by the widely used Gini VIM. As a consequence, researchers may overlook rare variants that contribute to the missing heritability. The goal of the present article is 3-fold: (i) to assess this effect quantitatively using simulation studies for different types of random forests (classical random forests and conditional inference forests, that employ unbiased variable selection criteria) as well as for different importance measures (Gini and permutation based); (ii) to explore the trees and to compare the behaviour of random forests and the standard logistic regression model in order to understand the statistical mechanisms behind the preference for common variants; and (iii) to summarize these results and previously investigated properties of random forest VIMs in the context of genetic association studies and to make practical recommendations regarding the choice of the random forest and variable importance type. All our analyses can be reproduced using R code available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/ginibias/.

Resumo Limpo

use random forest increas common genet associ studi variabl import measur vim automat calcul byproduct algorithm often use rank polymorph respect abil predict investig phenotyp investig characterist methodolog may consid import pitfal name common variant systemat favour wide use gini vim consequ research may overlook rare variant contribut miss herit goal present articl fold assess effect quantit use simul studi differ type random forest classic random forest condit infer forest employ unbias variabl select criteria well differ import measur gini permut base ii explor tree compar behaviour random forest standard logist regress model order understand statist mechan behind prefer common variant iii summar result previous investig properti random forest vim context genet associ studi make practic recommend regard choic random forest variabl import type analys can reproduc use r code avail companion websit httpwwwibemedunimuenchendeorganisationmitarbeiterprofessurenboulesteixginibia

Resumos Similares

Comput Math Methods Med - Exploratory bioinformatics study of lncRNAs in Alzheimer's disease mRNA sequences with application to drug development. ( 0,719704666791677 )
Brief. Bioinformatics - A comparative study of statistical methods used to identify dependencies between gene expression signals. ( 0,701741224899007 )
Brief. Bioinformatics - An efficient approach to large-scale genotype-phenotype association analyses. ( 0,647233813073009 )
Res Synth Methods - Practicalities of using a modified version of the Cochrane Collaboration risk of bias tool for randomised and non-randomised study designs applied in a health technology assessment setting. ( 0,63608556777085 )
J. Med. Internet Res. - Effectiveness of web-based interventions on patient empowerment: a systematic review and meta-analysis. ( 0,609585026087922 )
J Am Med Inform Assoc - Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis. ( 0,596287798733132 )
Perspect Health Inf Manag - Projected impact of the ICD-10-CM/PCS conversion on longitudinal data and the Joint Commission Core Measures. ( 0,581248315256399 )
J Biomed Inform - A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain. ( 0,580992893774127 )
Comput Math Methods Med - Genomic and functional analysis of the toxic effect of tachyplesin I on the embryonic development of zebrafish. ( 0,576607549224587 )
Med Biol Eng Comput - Intra-protocol repeatability and inter-protocol agreement for the analysis of scapulo-humeral coordination. ( 0,576216038776282 )
J. Med. Internet Res. - Assessment of a new web-based sexual concurrency measurement tool for men who have sex with men. ( 0,573507637656248 )
Res Synth Methods - Synthesizing regression results: a factored likelihood method. ( 0,571235829034831 )
J Clin Monit Comput - Comparison of ear and chest probes in transcutaneous carbon dioxide pressure measurements during general anesthesia in adults. ( 0,570233156951228 )
J Clin Monit Comput - Evaluation of the estimated continuous cardiac output monitoring system in adults and children undergoing kidney transplant surgery: a pilot study. ( 0,5643755546032 )
Comput. Biol. Med. - Gait variability and stability measures: minimum number of strides and within-session reliability. ( 0,561577813072416 )
J Clin Monit Comput - Comparison of SNAP? II and BIS Vista indices during normothermic cardiopulmonary bypass under isoflurane anesthesia. ( 0,560857530593819 )
Med Biol Eng Comput - Quantitative evaluation of upper-limb motor control in robot-aided rehabilitation. ( 0,55981710963426 )
J Biomed Inform - Clustering clinical trials with similar eligibility criteria features. ( 0,557502523176901 )
J Clin Monit Comput - Cardiac index measurements by transcutaneous Doppler ultrasound and transthoracic echocardiography in adult and pediatric emergency patients. ( 0,554413884524955 )
J Clin Monit Comput - Masseter muscle oxygen saturation is associated with central venous oxygen saturation in patients with severe sepsis. ( 0,554111386957249 )
Int J Comput Assist Radiol Surg - Acetabular orientation variability and symmetry based on CT scans of adults. ( 0,553348819467569 )
J Clin Monit Comput - Peripheral tissue oximetry: comparing three commercial near-infrared spectroscopy oximeters on the forearm. ( 0,552992634569721 )
Comput Math Methods Med - Transcriptional protein-protein cooperativity in POU/HMG/DNA complexes revealed by normal mode analysis. ( 0,552033166468576 )
Comput. Aided Surg. - Non-invasive quantification of lower limb mechanical alignment in flexion. ( 0,549360726638028 )
IEEE Trans Vis Comput Graph - How Can Visual Analytics Assist Investigative Analysis? Design Implications from an Evaluation. ( 0,548359722248747 )
Comput Math Methods Med - The approach to steady state using homogeneous and Cartesian coordinates. ( 0,547464767705537 )
AMIA Annu Symp Proc - When you can't tell when it hurts: a preliminary algorithm to assess pain in patients who can't communicate. ( 0,541503404074163 )
Brief. Bioinformatics - Ranking prognosis markers in cancer genomic studies. ( 0,539903333508494 )
J Integr Bioinform - miReg: a resource for microRNA regulation. ( 0,53897912558313 )
J Clin Monit Comput - Reliability of the volatile agent consumption display in the Draeger Primus? anesthesia machine. ( 0,536092989483986 )
J Clin Monit Comput - Non-invasive measurement of cardiac output in obese children and adolescents: comparison of electrical cardiometry and transthoracic Doppler echocardiography. ( 0,535954380918387 )
Comput. Biol. Med. - Approach for streamlining measurement of complex physiological phenotypes of upper airway collapsibility. ( 0,534803032166336 )
Comput Methods Programs Biomed - Estimation of the concordance correlation coefficient for repeated measures using SAS and R. ( 0,534560352487481 )
J Integr Bioinform - BacillusRegNet: a transcriptional regulation database and analysis platform for Bacillus species. ( 0,533851572691845 )
Brief. Bioinformatics - Transcription factor and microRNA co-regulatory loops: important regulatory motifs in biological processes and diseases. ( 0,533233104193316 )
J Biomed Inform - Transcriptional networks characterize ventricular dysfunction after myocardial infarction: a proof-of-concept investigation. ( 0,532033723700688 )
J. Comput. Biol. - A stationary wavelet entropy-based clustering approach accurately predicts gene expression. ( 0,531084483681323 )
Int J Neural Syst - A cluster merging method for time series microarray with production values. ( 0,5296394155254 )
Comput. Biol. Med. - Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data. ( 0,529359274860446 )
Comput Biol Chem - A novel k-word relative measure for sequence comparison. ( 0,528234200322881 )
Res Synth Methods - Performance of a proportion-based approach to meta-analytic moderator estimation: results from Monte Carlo simulations. ( 0,527017989446882 )
Comput Biol Chem - Prediction and verification of microRNAs related to proline accumulation under drought stress in potato. ( 0,521738551137634 )
Comput Biol Chem - Identification of miR159s and their target genes and expression analysis under drought stress in potato. ( 0,521660818450244 )
J Clin Monit Comput - Validation of indirect calorimetry for measurement of energy expenditure in healthy volunteers undergoing pressure controlled non-invasive ventilation support. ( 0,52095341830965 )
Brief. Bioinformatics - Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. ( 0,519052005684265 )
J Am Med Inform Assoc - Ability of pharmacy clinical decision-support software to alert users about clinically important drug-drug interactions. ( 0,516674429860456 )
Comput Math Methods Med - Long-term prediction of emergency department revenue and visitor volume using autoregressive integrated moving average model. ( 0,516183867278361 )
Brief. Bioinformatics - GO-function: deriving biologically relevant functions from statistically significant functions. ( 0,515665287924428 )
Comput. Biol. Med. - Inter- and intra-observer variability analysis of completely automated cIMT measurement software (AtheroEdge?) and its benchmarking against commercial ultrasound scanner and expert Readers. ( 0,515243906110904 )
Res Synth Methods - Issues relating to selective reporting when including non-randomized studies in systematic reviews on the effects of healthcare interventions. ( 0,514698260690543 )
J Integr Bioinform - Using surveys of Affymetrix GeneChips to study antisense expression. ( 0,51381649427997 )
J. Comput. Biol. - In silico approach to study adaptive divergence in nucleotide composition of the 16S rRNA gene among bacteria thriving under different temperature regimes. ( 0,513167766480117 )
Brief. Bioinformatics - Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. ( 0,512950569739094 )
Med Decis Making - Where is the evidence? A systematic review of shared decision making and patient outcomes. ( 0,511347146721139 )
J Integr Bioinform - Uncovering the expression patterns of chimeric transcripts using surveys of affymetrix GeneChips. ( 0,511037402918689 )
Int J Comput Assist Radiol Surg - New method for internal anal sphincter measurements: feasibility study. ( 0,510136427339743 )
Comput Biol Chem - Analysis of the NCI-60 dataset for cancer-related microRNA and mRNA using expression profiles. ( 0,508944220109671 )
J Med Syst - A knowledge based search tool for performance measures in health care systems. ( 0,508716962129995 )
Med Decis Making - Standard error of measurement of 5 health utility indexes across the range of health for use in estimating reliability and responsiveness. ( 0,508458704289153 )
J Med Syst - Automatic quantification of spinal curvature in scoliotic radiograph using image processing. ( 0,508326946165194 )
Int J Health Geogr - Comparing the accuracy of two secondary food environment data sources in the UK across socio-economic and urban/rural divides. ( 0,50651662979749 )
Comput Math Methods Med - Paraxial ocular measurements and entries in spectral and modal matrices: analogy and application. ( 0,506263726293646 )
J Integr Bioinform - Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features. ( 0,506041298498513 )
Methods Inf Med - Assessment of pain expression in infant cry signals using empirical mode decomposition. ( 0,505354630450165 )
IEEE J Biomed Health Inform - Identification of microsatellites in DNA using adaptive S-transform. ( 0,504635087201028 )
Methods Inf Med - Measuring inter-observer agreement in contour delineation of medical imaging in a dummy run using Fleiss' kappa. ( 0,504499878980268 )
J Clin Monit Comput - The effect of desflurane versus propofol on regional cerebral oxygenation in the sitting position for shoulder arthroscopy. ( 0,502661818185894 )
Comput. Biol. Med. - Time frequency power profile of QRS complex obtained with wavelet transform in spontaneously hypertensive rats. ( 0,500483757043903 )
J Chem Inf Model - Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors. ( 0,499676446120636 )
Comput. Biol. Med. - Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. ( 0,498618501546761 )
AMIA Annu Symp Proc - From simply inaccurate to complex and inaccurate: complexity in standards-based quality measures. ( 0,498239312234057 )
Int J Med Robot - Intraoperative measurement of femoral antetorsion using the anterior cortical angle method: a novel use for smartphones. ( 0,497136927267749 )
J Biomed Inform - Enabling enrichment analysis with the Human Disease Ontology. ( 0,496886255413928 )
Neural Comput - System identification of mGluR-dependent long-term depression. ( 0,496634573163814 )
Brief. Bioinformatics - Causes, consequences and solutions of phylogenetic incongruence. ( 0,495104959560178 )
Med Decis Making - Interventions to improve patient comprehension in informed consent for medical and surgical procedures: a systematic review. ( 0,495027337545731 )
Comput Biol Chem - Meta-analysis of microarray data: The case of imatinib resistance in chronic myelogenous leukemia. ( 0,493713491061997 )
J. Comput. Biol. - Triplex DNA:RNA, 3'-to-5' inverted RNA and protein coding in mitochondrial genomes. ( 0,492512724638948 )
J Biomed Inform - Inferring cell cycle feedback regulation from gene expression data. ( 0,492235882190597 )
Brief. Bioinformatics - OrthoDisease: tracking disease gene orthologs across 100 species. ( 0,491907229475652 )
Comput. Biol. Med. - Development and evaluation of an algorithm for computer analysis of maternal heart rate during labor. ( 0,490584227980787 )
J Am Med Inform Assoc - Marco Ramoni: an appreciation of academic achievement. ( 0,490152775585633 )
Comput Math Methods Med - Hypoxia in head and neck cancer in theory and practice: a PET-based imaging approach. ( 0,48902821841295 )
Comput. Biol. Med. - Revealing pathway maps of renal cell carcinoma by gene expression change. ( 0,487515955020986 )
Comput Methods Programs Biomed - Extracting more information from EEG recordings for a better description of sleep. ( 0,486704796792049 )
Brief. Bioinformatics - A case-control design for testing and estimating epigenetic effects on complex diseases. ( 0,486178651167515 )
Res Synth Methods - Inaccuracy of regression results in replacing bivariate correlations. ( 0,485425359804757 )
BMC Med Inform Decis Mak - ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. ( 0,485042591905617 )
J Biomed Inform - CPAS: a trans-omics pathway analysis tool for jointly analyzing DNA copy number variations and mRNA expression profiles data. ( 0,482900939180255 )
J Chem Inf Model - Improving the selectivity of antimicrobial peptides from anuran skin. ( 0,482656888987526 )
Comput Biol Chem - Mode of action classification of chemicals using multi-concentration time-dependent cellular response profiles. ( 0,482552708596115 )
Brief. Bioinformatics - The genomic and functional characteristics of disease genes. ( 0,481764270159043 )
Med Decis Making - Decisive evidence on a smaller-than-you-think phenomenon: revisiting the 1-in-X effect on subjective medical probabilities. ( 0,4807850813087 )
Brief. Bioinformatics - Gene set enrichment analysis: performance evaluation and usage guidelines. ( 0,480042023925735 )
J Clin Monit Comput - SNAP II versus BIS VISTA monitor comparison during general anesthesia. ( 0,479976549900123 )
Med Biol Eng Comput - Dynamic cerebral autoregulation: different signal processing methods without influence on results and reproducibility. ( 0,479694857472173 )
Comput. Biol. Med. - Statistical evaluation of coherence estimated from optimally beamformed signals. ( 0,47875290073673 )
Sci Data - miRNA expression atlas in male rat. ( 0,478141280473207 )
Med Biol Eng Comput - Patient-specific generation of the Purkinje network driven by clinical measurements of a normal propagation. ( 0,476944473530128 )
J Clin Monit Comput - Evaluation of point-of-care analyzers' ability to reduce bias in conductivity-based hematocrit measurement during cardiopulmonary bypass. ( 0,476287687679943 )