J Biomed Inform - Active learning strategies for the deduplication of electronic patient data using classification trees.

Tópicos

{ learn(2355) train(1041) set(1003) }
{ perform(999) metric(946) measur(919) }
{ activ(1452) weight(1219) physic(1104) }
{ structur(1116) can(940) graph(676) }
{ data(1737) use(1416) pattern(1282) }
{ perform(1367) use(1326) method(1137) }
{ studi(2440) review(1878) systemat(933) }
{ record(1888) medic(1808) patient(1693) }
{ estim(2440) model(1874) function(577) }
{ detect(2391) sensit(1101) algorithm(908) }
{ featur(3375) classif(2383) classifi(1994) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ can(774) often(719) complex(702) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ model(3480) simul(1196) paramet(876) }
{ cost(1906) reduc(1198) effect(832) }
{ use(1733) differ(960) four(931) }
{ imag(1057) registr(996) error(939) }
{ take(945) account(800) differ(722) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ design(1359) user(1324) use(1319) }
{ general(901) number(790) one(736) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ import(1318) role(1303) understand(862) }
{ monitor(1329) mobil(1314) devic(1160) }
{ patient(2837) hospit(1953) medic(668) }
{ age(1611) year(1155) adult(843) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ use(976) code(926) identifi(902) }
{ result(1111) use(1088) new(759) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

TRODUCTION: Supervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether a simple active learning strategy using binary comparison patterns is sufficient or if string metrics together with a more sophisticated algorithm are necessary to achieve high accuracies with a small training set.MATERIAL AND METHODS: Based on medical registry data with different numbers of attributes, we used active learning to acquire training sets for classification trees, which were then used to classify the remaining data. Active learning for binary patterns means that every distinct comparison pattern represents a stratum from which one item is sampled. Active learning for patterns consisting of the Levenshtein string metric values uses an iterative process where the most informative and representative examples are added to the training set. In this context, we extended the active learning strategy by Sarawagi and Bhamidipaty (2002).RESULTS: On the original data set, active learning based on binary comparison patterns leads to the best results. When dropping four or six attributes, using string metrics leads to better results. In both cases, not more than 200 manually reviewed training examples are necessary.CONCLUSIONS: In record linkage applications where only forename, name and birthday are available as attributes, we suggest the sophisticated active learning strategy based on string metrics in order to achieve highly accurate results. We recommend the simple strategy if more attributes are available, as in our study. In both cases, active learning significantly reduces the amount of manual involvement in training data selection compared to usual record linkage settings.

Resumo Limpo

troduct supervis record linkag method often requir cleric review gain inform train data activ learn mean activ prompt user label data special characterist order minimis review cost conduct empir evalu investig whether simpl activ learn strategi use binari comparison pattern suffici string metric togeth sophist algorithm necessari achiev high accuraci small train setmateri method base medic registri data differ number attribut use activ learn acquir train set classif tree use classifi remain data activ learn binari pattern mean everi distinct comparison pattern repres stratum one item sampl activ learn pattern consist levenshtein string metric valu use iter process inform repres exampl ad train set context extend activ learn strategi sarawagi bhamidipati result origin data set activ learn base binari comparison pattern lead best result drop four six attribut use string metric lead better result case manual review train exampl necessaryconclus record linkag applic forenam name birthday avail attribut suggest sophist activ learn strategi base string metric order achiev high accur result recommend simpl strategi attribut avail studi case activ learn signific reduc amount manual involv train data select compar usual record linkag set

Resumos Similares

IEEE Trans Neural Netw Learn Syst - A Kernel Classification Framework for Metric Learning. ( 0,753413924484909 )
IEEE Trans Image Process - Decomposition-based transfer distance metric learning for image classification. ( 0,736794834999993 )
IEEE Trans Pattern Anal Mach Intell - Online Multiple Kernel Similarity Learning for Visual Search. ( 0,728400980154972 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,717581821561901 )
IEEE Trans Image Process - Self-supervised online metric learning with low rank constraint for scene categorization. ( 0,715923912755671 )
IEEE Trans Pattern Anal Mach Intell - Weakly Supervised Recognition of Daily Life Activities with Wearable Sensors. ( 0,712202440809379 )
J Biomed Inform - Class proximity measures--dissimilarity-based classification and display of high-dimensional data. ( 0,708289286338985 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,708234472009873 )
IEEE Trans Pattern Anal Mach Intell - Distance-Based Image Classification: Generalizing to New Classes at Near Zero Cost. ( 0,706629285224584 )
Neural Comput - Online learning with (multiple) kernels: a review. ( 0,702639035967009 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,696477257802504 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,688297110785031 )
J Biomed Inform - Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosis. ( 0,678302785418184 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,674282841300966 )
IEEE Trans Neural Netw Learn Syst - An efficient topological distance-based tree kernel. ( 0,673055275676438 )
IEEE Trans Image Process - Multiple-kernel, multiple-instance similarity features for efficient visual object detection. ( 0,673013360052798 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,67148978155638 )
J Biomed Inform - Reducing systematic review workload through certainty-based screening. ( 0,669689502592748 )
IEEE Trans Image Process - Task-specific image partitioning. ( 0,664879789441311 )
Comput Math Methods Med - Pulse waveform classification using support vector machine with Gaussian time warp edit distance kernel. ( 0,662930631297955 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,662545091408689 )
J Biomed Inform - Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms. ( 0,659489590287315 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,656955072084462 )
IEEE Trans Neural Netw Learn Syst - Partially shared latent factor learning with multiview data. ( 0,653531979454142 )
IEEE Trans Image Process - Artistic image analysis using graph-based learning approaches. ( 0,652923006691904 )
Neural Comput - Reduction from cost-sensitive ordinal ranking to weighted binary classification. ( 0,652923006691904 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,650876260099486 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,649827016357902 )
IEEE Trans Image Process - Saliency and gist features for target detection in satellite images. ( 0,649548698354942 )
IEEE Trans Image Process - Hyperspectral image classification through bilayer graph-based learning. ( 0,647915194578552 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,641491957124921 )
Comput Methods Programs Biomed - Comparison of machine learning methods for classifying aphasic and non-aphasic speakers. ( 0,637482099738262 )
IEEE Trans Image Process - Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion. ( 0,636793577154632 )
IEEE Trans Pattern Anal Mach Intell - Representation Learning: A Review and New Perspectives. ( 0,635885344158727 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,634795302133806 )
Neural Comput - Refined rademacher chaos complexity bounds with applications to the multikernel learning problem. ( 0,632147864205437 )
IEEE Trans Image Process - A linear support higher-order tensor machine for classification. ( 0,631722721565158 )
Neural Comput - Exploitation of pairwise class distances for ordinal classification. ( 0,629128677577554 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,627529480977221 )
IEEE Trans Image Process - Manifold regularized multitask learning for semi-supervised multilabel image classification. ( 0,626172705410933 )
Neural Comput - Computing sparse representations of multidimensional signals using Kronecker bases. ( 0,625660667620839 )
IEEE Trans Image Process - Multiview Hessian regularization for image annotation. ( 0,62540994628212 )
IEEE Trans Image Process - Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. ( 0,624381399174407 )
Methods Inf Med - Probability machines: consistent probability estimation using nonparametric learning machines. ( 0,622465623082749 )
IEEE Trans Image Process - Structured max-margin learning for inter-related classifier training and multilabel image annotation. ( 0,622403962164937 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,621486323660921 )
Artif Intell Med - An evaluation of heuristics for rule ranking. ( 0,620246712164347 )
Neural Comput - Enhanced gradient for training restricted Boltzmann machines. ( 0,619773862359503 )
Comput. Biol. Med. - Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors. ( 0,618580947143788 )
IEEE J Biomed Health Inform - Identifying mammalian MicroRNA targets based on supervised distance metric learning. ( 0,617162556452767 )
Neural Comput - Improved similarity measures for small sets of spike trains. ( 0,613390452863131 )
Neural Comput - Divergence-based vector quantization. ( 0,612715778014221 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,61218496942521 )
Neural Comput - Multiple spectral kernel learning and a gaussian complexity computation. ( 0,611794860594083 )
IEEE Trans Image Process - Learning discriminative dictionary for group sparse representation. ( 0,610184163899811 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,60720288959451 )
Neural Comput - Metacognitive learning in a fully complex-valued radial basis function neural network. ( 0,60183452468403 )
IEEE Trans Image Process - Incremental training of a detector using online sparse eigendecomposition. ( 0,600712536646668 )
IEEE Trans Image Process - Fast semantic diffusion for large-scale context-based image and video annotation. ( 0,600361949480157 )
AMIA Annu Symp Proc - Comparison and combination of several MeSH indexing approaches. ( 0,59829341113418 )
Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. ( 0,597565472327745 )
IEEE Trans Image Process - Improving Web image search by bag-based reranking. ( 0,595959995757254 )
J Biomed Inform - Learning classification models from multiple experts. ( 0,595798915494867 )
Int J Neural Syst - Span: spike pattern association neuron for learning spatio-temporal spike patterns. ( 0,594785859814057 )
Neural Comput - Adaptive multiclass classification for brain computer interfaces. ( 0,591102187543284 )
AMIA Annu Symp Proc - Sample-efficient learning with auxiliary class-label information. ( 0,590595812814537 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection and Kernel Learning for Local Learning-Based Clustering. ( 0,590239464943979 )
Comput Methods Programs Biomed - Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface. ( 0,589562359440759 )
Int J Neural Syst - Online semi-supervised growing neural gas. ( 0,587146167326718 )
Neural Comput - Incremental learning by message passing in hierarchical temporal memory. ( 0,585438424083485 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,58369861837251 )
IEEE Trans Pattern Anal Mach Intell - Label Consistent K-SVD: Learning A Discriminative Dictionary for Recognition. ( 0,583422162023407 )
J Chem Inf Model - Anatomy of high-performance 2D similarity calculations. ( 0,583317832959107 )
IEEE Trans Image Process - Person re-identification over camera networks using multi-task distance metric learning. ( 0,582982678324737 )
Artif Intell Med - A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. ( 0,580547138970086 )
IEEE Trans Image Process - Multiple kernel sparse representations for supervised and unsupervised learning. ( 0,577302709547946 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,577191952289154 )
Neural Comput - U-processes and preference learning. ( 0,576876254831963 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,576688431039529 )
J. Comput. Biol. - The irredundant class method for remote homology detection of protein sequences. ( 0,576195233997626 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,575130577858969 )
Comput. Biol. Med. - EEG-based emotion estimation using Bayesian weighted-log-posterior function and perceptron convergence algorithm. ( 0,574182389294861 )
IEEE J Biomed Health Inform - Content Based Image Retrieval by Metric Learning from Radiology Reports: Application to Interstitial Lung Diseases. ( 0,572023916595802 )
J Chem Inf Model - Note on naive Bayes based on binary descriptors in cheminformatics. ( 0,571302094083231 )
Neural Comput - Representing objects, relations, and sequences. ( 0,570329884588914 )
J Chem Inf Model - SCISSORS: practical considerations. ( 0,570124512374701 )
AMIA Annu Symp Proc - Classification of medication status change in clinical narratives. ( 0,568365827592804 )
IEEE Trans Image Process - Design of non-linear kernel dictionaries for object recognition. ( 0,567918350670582 )
Neural Comput - Large margin low rank tensor analysis. ( 0,567244996739675 )
Neural Comput - EEG data space adaptation to reduce intersession nonstationarity in brain-computer interface. ( 0,567223295644109 )
IEEE Trans Image Process - Real-time probabilistic covariance tracking with efficient model update. ( 0,566042392034111 )
J Am Med Inform Assoc - Active learning for clinical text classification: is it better than random sampling? ( 0,565382033362645 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,56303587260146 )
Comput Methods Programs Biomed - Multistage approach for clustering and classification of ECG data. ( 0,562264271798757 )
J. Comput. Biol. - Locally learning biomedical data using diffusion frames. ( 0,561899182201243 )
J Am Med Inform Assoc - Supervised machine learning and active learning in classification of radiology reports. ( 0,558874358130917 )
IEEE Trans Pattern Anal Mach Intell - Facial Age Estimation by Learning from Label Distributions. ( 0,558226242136168 )
Int J Neural Syst - Linear time relational prototype based learning. ( 0,557931029596093 )
IEEE Trans Neural Netw Learn Syst - Ordinal Distance Metric Learning for Image Ranking. ( 0,557534385382758 )
IEEE Trans Image Process - Unsupervised amplitude and texture classification of SAR images with multinomial latent model. ( 0,556791230592383 )