J Am Med Inform Assoc - Missing values in deduplication of electronic patient data.

Tópicos

{ learn(2355) train(1041) set(1003) }
{ featur(3375) classif(2383) classifi(1994) }
{ perform(1367) use(1326) method(1137) }
{ measur(2081) correl(1212) valu(896) }
{ cost(1906) reduc(1198) effect(832) }
{ inform(2794) health(2639) internet(1427) }
{ sequenc(1873) structur(1644) protein(1328) }
{ clinic(1479) use(1117) guidelin(835) }
{ design(1359) user(1324) use(1319) }
{ group(2977) signific(1463) compar(1072) }
{ structur(1116) can(940) graph(676) }
{ method(1969) cluster(1462) data(1082) }
{ treatment(1704) effect(941) patient(846) }
{ method(1557) propos(1049) approach(1037) }
{ model(3480) simul(1196) paramet(876) }
{ patient(1821) servic(1111) care(1106) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ data(1737) use(1416) pattern(1282) }
{ studi(2440) review(1878) systemat(933) }
{ problem(2511) optim(1539) algorithm(950) }
{ algorithm(1844) comput(1787) effici(935) }
{ data(1714) softwar(1251) tool(1186) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ howev(809) still(633) remain(590) }
{ risk(3053) factor(974) diseas(938) }
{ spatial(1525) area(1432) region(1030) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ age(1611) year(1155) adult(843) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ cancer(2502) breast(956) screen(824) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

TRODUCTION: Systematic approaches to dealing with missing values in record linkage are still lacking. This article compares the ad-hoc treatment of unknown comparison values as 'unequal' with other and more sophisticated approaches. An empirical evaluation was conducted of the methods on real-world data as well as on simulated data based on them.MATERIAL AND METHODS: Cancer registry data and artificial data with increased numbers of missing values in a relevant variable are used for empirical comparisons. As a classification method, classification and regression trees were used. On the resulting binary comparison patterns, the following strategies for dealing with missingness are considered: imputation with unique values, sample-based imputation, reduced-model classification and complete-case induction. These approaches are evaluated according to the number of training data needed for induction and the F-scores achieved.RESULTS: The evaluations reveal that unique value imputation leads to the best results. Imputation with zero is preferred to imputation with 0.5, although the latter shows the highest median F-scores. Imputation with zero needs considerably less training data, it shows only slightly worse results and simplifies the computation by maintaining the binary structure of the data.CONCLUSIONS: The results support the ad-hoc solution for missing values 'replace NA by the value of inequality'. This conclusion is based on a limited amount of data and on a specific deduplication method. Nevertheless, the authors are confident that their results should be confirmed by other empirical analyses and applications.

Resumo Limpo

troduct systemat approach deal miss valu record linkag still lack articl compar adhoc treatment unknown comparison valu unequ sophist approach empir evalu conduct method realworld data well simul data base themmateri method cancer registri data artifici data increas number miss valu relev variabl use empir comparison classif method classif regress tree use result binari comparison pattern follow strategi deal missing consid imput uniqu valu samplebas imput reducedmodel classif completecas induct approach evalu accord number train data need induct fscore achievedresult evalu reveal uniqu valu imput lead best result imput zero prefer imput although latter show highest median fscore imput zero need consider less train data show slight wors result simplifi comput maintain binari structur dataconclus result support adhoc solut miss valu replac na valu inequ conclus base limit amount data specif dedupl method nevertheless author confid result confirm empir analys applic

Resumos Similares

Artif Intell Med - A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. ( 0,736092955227447 )
Comput. Biol. Med. - Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors. ( 0,712712973190001 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,708360750131372 )
J Chem Inf Model - Classifying large chemical data sets: using a regularized potential function method. ( 0,691266217172243 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,685671505388831 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,676842631693636 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,675823542482408 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,674356429242396 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,674133524761829 )
J Med Syst - A computer aided diagnosis system for thyroid disease using extreme learning machine. ( 0,666585208925043 )
Neural Comput - Large margin low rank tensor analysis. ( 0,650458091665601 )
IEEE Trans Image Process - Multiple-kernel, multiple-instance similarity features for efficient visual object detection. ( 0,650432611979058 )
Neural Comput - Divergence-based vector quantization. ( 0,649021223419874 )
Comput. Biol. Med. - Relabeling algorithm for retrieval of noisy instances and improving prediction quality. ( 0,648054985880259 )
J Am Med Inform Assoc - Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements. ( 0,643632579170326 )
J Biomed Inform - A medical diagnostic tool based on radial basis function classifiers and evolutionary simulated annealing. ( 0,633302579657392 )
Artif Intell Med - Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. ( 0,631614354227305 )
J Am Med Inform Assoc - Supervised machine learning and active learning in classification of radiology reports. ( 0,630564655350358 )
J Am Med Inform Assoc - Applying active learning to high-throughput phenotyping algorithms for electronic health records data. ( 0,628418592678746 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,625658674081441 )
Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,620363366893464 )
IEEE J Biomed Health Inform - Multiple kernel learning in the primal for multimodal Alzheimer's disease classification. ( 0,619870141588616 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,618886858749316 )
AMIA Annu Symp Proc - Predicting discharge mortality after acute ischemic stroke using balanced data. ( 0,615502276105989 )
Neural Comput - Online learning with (multiple) kernels: a review. ( 0,613706208879088 )
IEEE Trans Image Process - Task-specific image partitioning. ( 0,612774358730815 )
IEEE Trans Image Process - A linear support higher-order tensor machine for classification. ( 0,611601860733184 )
Neural Comput - Adaptive multiclass classification for brain computer interfaces. ( 0,611326688779255 )
IEEE Trans Pattern Anal Mach Intell - Learning Hierarchical Features for Scene Labeling. ( 0,61088374350705 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,610652416505435 )
J Am Med Inform Assoc - Discretization of continuous features in clinical datasets. ( 0,609992501160707 )
J Med Syst - Diagnosis of several diseases by using combined kernels with Support Vector Machine. ( 0,606980734654438 )
Comput Methods Programs Biomed - Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface. ( 0,606011769861072 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,604697165961871 )
Comput. Biol. Med. - Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. ( 0,602099720863812 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,600710847664809 )
IEEE Trans Image Process - Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion. ( 0,599788667074227 )
Neural Comput - Reduction from cost-sensitive ordinal ranking to weighted binary classification. ( 0,599596220432438 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,598745970977693 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,598725618949366 )
Comput Methods Programs Biomed - Denoised P300 and machine learning-based concealed information test method. ( 0,596730082525433 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,596210618706892 )
Comput. Biol. Med. - Application of machine learning techniques to analyse the effects of physical exercise in ventricular fibrillation. ( 0,595771759285527 )
J Med Syst - Super wavelet for sEMG signal extraction during dynamic fatiguing contractions. ( 0,595585305549808 )
J Chem Inf Model - Classifying molecules using a sparse probabilistic kernel binary classifier. ( 0,594835406251336 )
Comput. Biol. Med. - Automated Marsh-like classification of celiac disease in children using local texture operators. ( 0,594743095732262 )
J Chem Inf Model - In silico prediction of chemical acute oral toxicity using multi-classification methods. ( 0,594232895931232 )
AMIA Annu Symp Proc - Improving predictions in imbalanced data using Pairwise Expanded Logistic Regression. ( 0,593564942127213 )
Med Biol Eng Comput - Single-trial classification of antagonistic oxyhemoglobin responses during mental arithmetic. ( 0,593564942127213 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,592885957393166 )
J Med Syst - 3D matrix pattern based Support Vector Machines for identifying pulmonary cancer in CT scanned images. ( 0,592161908778342 )
IEEE Trans Neural Netw Learn Syst - The generalization ability of online SVM classification based on Markov sampling. ( 0,591528268187763 )
Comput Math Methods Med - Comparison of two methods forecasting binding rate of plasma protein. ( 0,588996289320026 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,588137322374895 )
AMIA Annu Symp Proc - Comparison and combination of several MeSH indexing approaches. ( 0,587749394675996 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,587695095283122 )
Neural Comput - Feature selection for ordinal text classification. ( 0,586331559052572 )
J Med Syst - Symptomatic vs. asymptomatic plaque classification in carotid ultrasound. ( 0,585894848792875 )
Int J Med Inform - An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. ( 0,585151559156903 )
IEEE Trans Pattern Anal Mach Intell - Good Practice in Large-Scale Learning for Image Classification. ( 0,582215612064887 )
IEEE Trans Neural Netw Learn Syst - Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects. ( 0,581483816994841 )
J Am Med Inform Assoc - A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction. ( 0,580307484437654 )
J Biomed Inform - Class proximity measures--dissimilarity-based classification and display of high-dimensional data. ( 0,5795531460166 )
BMC Med Inform Decis Mak - Decision tree-based learning to predict patient controlled analgesia consumption and readjustment. ( 0,579494799455746 )
Med Biol Eng Comput - A comparison of univariate, vector, bilinear autoregressive, and band power features for brain-computer interfaces. ( 0,578642033869148 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,578367432898312 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,577103651276349 )
IEEE Trans Pattern Anal Mach Intell - Distance-Based Image Classification: Generalizing to New Classes at Near Zero Cost. ( 0,575982071864926 )
Brief. Bioinformatics - Class-imbalanced classifiers for high-dimensional data. ( 0,575227140164528 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,574158129551073 )
IEEE Trans Image Process - Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. ( 0,573867945594252 )
Artif Intell Med - Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C. ( 0,573424210593498 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,57259132613813 )
J Med Syst - Classification of juvenile myoclonic epilepsy data acquired through scanning electromyography with machine learning algorithms. ( 0,572270908498539 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,57022909625628 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,570205167161312 )
IEEE Trans Image Process - A novel technique for subpixel image classification based on support vector machine. ( 0,569897435655569 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,567956139744384 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,567348931757987 )
J Chem Inf Model - A binary ant colony optimization classifier for molecular activities. ( 0,566726624037212 )
Comput Methods Programs Biomed - An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. ( 0,566721864488336 )
IEEE Trans Image Process - Unified structured learning for simultaneous human pose estimation and garment attribute classification. ( 0,566363130656645 )
Int J Comput Assist Radiol Surg - Investigating machine learning techniques for MRI-based classification of brain neoplasms. ( 0,565356641789148 )
IEEE Trans Image Process - Multiview Hessian regularization for image annotation. ( 0,561186942245946 )
IEEE Trans Image Process - Enhancing training collections for image annotation: an instance-weighted mixture modeling approach. ( 0,557772578970527 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,55601875482267 )
J Biomed Inform - Learning classification models from multiple experts. ( 0,555869408897448 )
Comput. Biol. Med. - Contourlet-based mammography mass classification using the SVM family. ( 0,554523032734888 )
Comput. Biol. Med. - A learning method for the class imbalance problem with medical data sets. ( 0,554209983820276 )
IEEE Trans Image Process - Real-time probabilistic covariance tracking with efficient model update. ( 0,552331858397619 )
J Am Med Inform Assoc - Active learning for clinical text classification: is it better than random sampling? ( 0,552319278382394 )
Comput Math Methods Med - Mixed-norm regularization for brain decoding. ( 0,552297566403148 )
Artif Intell Med - Suppressed fuzzy-soft learning vector quantization for MRI segmentation. ( 0,551619246674888 )
IEEE Trans Image Process - Hyperspectral image classification through bilayer graph-based learning. ( 0,550191147719058 )
Artif Intell Med - Prediction of intraoperative complexity from preoperative patient data for laparoscopic cholecystectomy. ( 0,550063411158144 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,549939401584753 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,549838641714844 )
IEEE Trans Image Process - Improving Web image search by bag-based reranking. ( 0,549029343427384 )
J Med Syst - Automated screening of arrhythmia using wavelet based machine learning techniques. ( 0,548455042988232 )
Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,548091143166923 )