Artif Intell Med - Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

Tópicos

{ method(1969) cluster(1462) data(1082) }
{ model(2341) predict(2261) use(1141) }
{ learn(2355) train(1041) set(1003) }
{ cancer(2502) breast(956) screen(824) }
{ group(2977) signific(1463) compar(1072) }
{ error(1145) method(1030) estim(1020) }
{ measur(2081) correl(1212) valu(896) }
{ structur(1116) can(940) graph(676) }
{ state(1844) use(1261) util(961) }
{ network(2748) neural(1063) input(814) }
{ compound(1573) activ(1297) structur(1058) }
{ patient(2837) hospit(1953) medic(668) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ data(1714) softwar(1251) tool(1186) }
{ care(1570) inform(1187) nurs(1089) }
{ case(1353) use(1143) diagnosi(1136) }
{ perform(1367) use(1326) method(1137) }
{ data(1737) use(1416) pattern(1282) }
{ imag(1057) registr(996) error(939) }
{ imag(2830) propos(1344) filter(1198) }
{ take(945) account(800) differ(722) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ search(2224) databas(1162) retriev(909) }
{ data(3963) clinic(1234) research(1004) }
{ perform(999) metric(946) measur(919) }
{ record(1888) medic(1808) patient(1693) }
{ age(1611) year(1155) adult(843) }
{ activ(1138) subject(705) human(624) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ use(1733) differ(960) four(931) }
{ estim(2440) model(1874) function(577) }
{ method(2212) result(1239) propos(1039) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

JECTIVES: Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set.MATERIALS AND METHODS: Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El ?lamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values.RESULTS: The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model.CONCLUSION: The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures.

Resumo Limpo

jectiv miss data imput import task case crucial use avail data discard record miss valu work evalu perform sever statist machin learn imput method use predict recurr patient extens real breast cancer data setmateri method imput method base statist techniqu eg mean hotdeck multipl imput machin learn techniqu eg multilay perceptron mlp selforganis map som knearest neighbour knn appli data collect el lamoi project result compar obtain listwis delet ld imput method databas includ demograph therapeut recurrencesurviv inform women oper invas breast cancer diagnos differ hospit belong spanish breast cancer research group geicam accuraci predict earli cancer relaps measur use artifici neural network ann differ ann estim use data set imput miss valuesresult imput method base machin learn algorithm outperform imput statist method predict patient outcom friedman test reveal signific differ p observ area roc curv auc valu pairwis comparison test show auc mlp knn som signific higher p p p respect auc ldbase prognosi modelconclus method base machin learn techniqu suit imput miss valu led signific enhanc prognosi accuraci compar imput method base statist procedur

Resumos Similares

AMIA Annu Symp Proc - Survival prediction and treatment recommendation with Bayesian techniques in lung cancer. ( 0,756165601813703 )
J Chem Inf Model - Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. ( 0,705196587040226 )
Med Decis Making - Cost-saving tree-structured survival analysis for hip fracture of study of osteoporotic fractures data. ( 0,700186418641729 )
Int J Health Geogr - A binary-based approach for detecting irregularly shaped clusters. ( 0,688707079216005 )
Spat Spatiotemporal Epidemiol - Optimal selection of the spatial scan parameters for cluster detection: a simulation study. ( 0,679464729890112 )
AMIA Annu Symp Proc - Patient clustering with uncoded text in electronic medical records. ( 0,678142589006458 )
Int J Health Geogr - Detecting activity locations from raw GPS data: a novel kernel-based algorithm. ( 0,667248992056166 )
Int J Health Geogr - Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: a case study on murine typhus in south Texas. ( 0,660804033995916 )
IEEE Trans Vis Comput Graph - GPU-based Multilevel Clustering. ( 0,66007363792042 )
Artif Intell Med - Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection. ( 0,655230193703005 )
IEEE Trans Pattern Anal Mach Intell - Semi-Supervised Kernel Mean Shift Clustering. ( 0,653433356919253 )
Comput. Biol. Med. - A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients. ( 0,65247681486164 )
J Biomed Inform - Quantifying the determinants of outbreak detection performance through simulation and machine learning. ( 0,651754249382071 )
Comput Math Methods Med - Novel harmonic regularization approach for variable selection in Cox's proportional hazards model. ( 0,647134708331792 )
AMIA Annu Symp Proc - Using hierarchical mixture of experts model for fusion of outbreak detection methods. ( 0,643709240449935 )
AMIA Annu Symp Proc - Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles. ( 0,643498658320738 )
J Chem Inf Model - String kernels and high-quality data set for improved prediction of kinked helices in a-helical membrane proteins. ( 0,64258043795755 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,638402127569807 )
Comput Math Methods Med - A robust rerank approach for feature selection and its application to pooling-based GWA studies. ( 0,637045318934356 )
J Chem Inf Model - Investigation of the use of spectral clustering for the analysis of molecular data. ( 0,636815899645162 )
J Chem Inf Model - Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. ( 0,631994562651185 )
IEEE Trans Neural Netw Learn Syst - Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. ( 0,631324357622532 )
Comput Methods Programs Biomed - Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. ( 0,630729736158813 )
BMC Med Inform Decis Mak - Efficient algorithms for fast integration on large data sets from multiple sources. ( 0,624215305801299 )
Med Decis Making - Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. ( 0,624154962106149 )
J Biomed Inform - Learning Bayesian networks from survival data using weighting censored instances. ( 0,620290023020685 )
Neural Comput - Spontaneous clustering via minimum -divergence. ( 0,618359900813954 )
BMC Med Inform Decis Mak - An evidential reasoning based model for diagnosis of lymph node metastasis in gastric cancer. ( 0,61747781096462 )
IEEE Trans Pattern Anal Mach Intell - A Link-Based Approach to the Cluster Ensemble Problem. ( 0,613324497176546 )
Int J Health Geogr - Detection of clusters of a rare disease over a large territory: performance of cluster detection methods. ( 0,60816260995372 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,605925601967831 )
IEEE Trans Image Process - Subspaces indexing model on Grassmann manifold for image search. ( 0,600742000648519 )
J Med Syst - Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets. ( 0,59575056072586 )
Med Decis Making - Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials. ( 0,593824335566124 )
Int J Comput Assist Radiol Surg - A Hessian-based filter for vascular segmentation of noisy hepatic CT scans. ( 0,582555845869112 )
J Chem Inf Model - Consensus methods for combining multiple clusterings of chemical structures. ( 0,581770667049184 )
Comput. Biol. Med. - A straightforward approach to computer-aided polyp detection using a polyp-specific volumetric feature in CT colonography. ( 0,580201422020024 )
J Integr Bioinform - An evolutionary and visual framework for clustering of DNA microarray data. ( 0,579984848446629 )
J Am Med Inform Assoc - Predicting complications of percutaneous coronary intervention using a novel support vector method. ( 0,579329844228044 )
Comput Math Methods Med - Comparison of semiparametric, parametric, and nonparametric ROC analysis for continuous diagnostic tests using a simulation study and acute coronary syndrome data. ( 0,578361679807353 )
Methods Inf Med - Extending statistical boosting. An overview of recent methodological developments. ( 0,577563257286784 )
Int J Health Geogr - Using statistical methods and genotyping to detect tuberculosis outbreaks. ( 0,571892262570891 )
Comput Methods Programs Biomed - Fuzzy and hard clustering analysis for thyroid disease. ( 0,569051738087923 )
J Integr Bioinform - Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. ( 0,568051416617728 )
J Biomed Inform - Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. ( 0,565219596837755 )
J Med Syst - A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases. ( 0,564867795160685 )
J. Comput. Biol. - A geometric clustering algorithm with applications to structural data. ( 0,56235644159406 )
Comput Math Methods Med - A study of rough set approach in gastroenterology. ( 0,557683335546585 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,556652261556517 )
J Chem Inf Model - Toward a better pharmacophore description of P-glycoprotein modulators, based on macrocyclic diterpenes from Euphorbia species. ( 0,555629146086321 )
Int J Med Robot - Coordinated control and experimentation of the dental arch generator of the tooth-arrangement robot. ( 0,55498070700236 )
Comput Methods Programs Biomed - Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. ( 0,550537181314702 )
Comput Math Methods Med - A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation. ( 0,549147172240234 )
Artif Intell Med - Improved modeling of clinical data with kernel methods. ( 0,547641417069466 )
Comput. Biol. Med. - Evaluation of automatic feature detection algorithms in EEG: application to interburst intervals. ( 0,547165708863169 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,542854177681213 )
Med Biol Eng Comput - A mathematical method for constraint-based cluster analysis towards optimized constrictive diameter smoothing of saphenous vein grafts. ( 0,541363034673498 )
J Am Med Inform Assoc - Privacy-preserving heterogeneous health data sharing. ( 0,538785434575083 )
Comput. Biol. Med. - Predicting cardiac autonomic neuropathy category for diabetic data with missing values. ( 0,538517134054184 )
J Chem Inf Model - Benchmark data sets for structure-based computational target prediction. ( 0,537115271833523 )
Neural Comput - A nonparametric clustering algorithm with a quantile-based likelihood estimator. ( 0,536166487447265 )
Comput Math Methods Med - Decimative spectral estimation with unconstrained model order. ( 0,536098001349013 )
J. Comput. Biol. - Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data. ( 0,535699134561206 )
Comput Methods Programs Biomed - An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. ( 0,535624004890751 )
Int J Neural Syst - A genetic graph-based approach for partitional clustering. ( 0,535102242883785 )
Comput. Aided Surg. - The Equidistant Method - a novel hip joint simulation algorithm for detection of femoroacetabular impingement. ( 0,533813823630555 )
IEEE Trans Image Process - Evaluating combinational illumination estimation methods on real-world images. ( 0,533323030685197 )
J Biomed Inform - Extension of the survival dimensionality reduction algorithm to detect epistasis in competing risks models (SDR-CR). ( 0,532842998674948 )
J Med Syst - Classification of juvenile myoclonic epilepsy data acquired through scanning electromyography with machine learning algorithms. ( 0,532485402814683 )
Int J Comput Assist Radiol Surg - CT dataset anisotropy management for oral implantology planning software. ( 0,531933922586176 )
Med Biol Eng Comput - Detection of swallows with silent aspiration using swallowing and breath sound analysis. ( 0,531134604869854 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,530189450962427 )
IEEE J Biomed Health Inform - LMI-Based Approaches for the Calibration of Continuous Glucose Measurement Sensors. ( 0,529231366209125 )
Int J Health Geogr - Voronoi distance based prospective space-time scans for point data sets: a dengue fever cluster analysis in a southeast Brazilian town. ( 0,529035595323279 )
Comput Methods Programs Biomed - Semi-automated and fully automated mammographic density measurement and breast cancer risk prediction. ( 0,528666563948819 )
Comput. Biol. Med. - Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. ( 0,527704271222003 )
Int J Health Geogr - Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study. ( 0,52763035595907 )
J Biomed Inform - Statistical process control for validating a classification tree model for predicting mortality--a novel approach towards temporal validation. ( 0,527360295629162 )
Spat Spatiotemporal Epidemiol - Performance of cancer cluster Q-statistics for case-control residential histories. ( 0,524939431574257 )
IEEE Trans Neural Netw Learn Syst - Learning Stable Multilevel Dictionaries for Sparse Representations. ( 0,524244879551483 )
Neural Comput - Feature selection for ordinal text classification. ( 0,524149887964192 )
J Med Syst - Microwave tomography analysis system for breast tumor detection. ( 0,523828204744765 )
IEEE Trans Image Process - A comparative review of component tree computation algorithms. ( 0,523577852378561 )
AMIA Annu Symp Proc - Alignment and clustering of breast cancer patients by longitudinal treatment history. ( 0,523365649109359 )
Artif Intell Med - Multi-test decision tree and its application to microarray data classification. ( 0,521191400390297 )
Int J Neural Syst - A cluster merging method for time series microarray with production values. ( 0,519326364819672 )
Artif Intell Med - Missing data in medical databases: impute, delete or classify? ( 0,518948982386809 )
J Chem Inf Model - Visualization of molecular fingerprints. ( 0,517426815293204 )
Comput Math Methods Med - Investigation of attenuation correction for small-animal single photon emission computed tomography. ( 0,514103033195676 )
Int J Comput Assist Radiol Surg - Image feature evaluation in two new mammography CAD prototypes. ( 0,513574451124241 )
Brief. Bioinformatics - Accounting for noise when clustering biological data. ( 0,51248576633361 )
J Am Med Inform Assoc - Stochastic model search with binary outcomes for genome-wide association studies. ( 0,512028825491893 )
IEEE J Biomed Health Inform - Laryngeal Tumor Detection and Classification in Endoscopic Video. ( 0,511781140696384 )
Comput. Biol. Med. - Analysis of adductors angle measurement in Hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. ( 0,509238406465667 )
J Med Syst - Employing post-DEA cross-evaluation and cluster analysis in a sample of Greek NHS hospitals. ( 0,508817899346447 )
J Chem Inf Model - Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. ( 0,508723935759582 )
J Am Med Inform Assoc - Automatic classification of mammography reports by BI-RADS breast tissue composition class. ( 0,508574880764964 )
Int J Health Geogr - Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters. ( 0,508219934929714 )
Int J Health Geogr - Interactive web-based mapping: bridging technology and data for health. ( 0,507500574790614 )
BMC Med Inform Decis Mak - CMDX?-based single source information system for simplified quality management and clinical research in prostate cancer. ( 0,505162749353364 )