BMC Med Inform Decis Mak - Efficient algorithms for fast integration on large data sets from multiple sources.

Tópicos

{ method(1969) cluster(1462) data(1082) }
{ record(1888) medic(1808) patient(1693) }
{ perform(999) metric(946) measur(919) }
{ learn(2355) train(1041) set(1003) }
{ data(3963) clinic(1234) research(1004) }
{ model(3480) simul(1196) paramet(876) }
{ imag(2675) segment(2577) method(1081) }
{ health(3367) inform(1360) care(1135) }
{ first(2504) two(1366) second(1323) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ patient(2315) diseas(1263) diabet(1191) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ model(2341) predict(2261) use(1141) }
{ monitor(1329) mobil(1314) devic(1160) }
{ can(981) present(881) function(850) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ search(2224) databas(1162) retriev(909) }
{ structur(1116) can(940) graph(676) }
{ use(976) code(926) identifi(902) }
{ survey(1388) particip(1329) question(1065) }
{ method(2212) result(1239) propos(1039) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ sequenc(1873) structur(1644) protein(1328) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ featur(1941) imag(1645) propos(1176) }
{ import(1318) role(1303) understand(862) }
{ studi(1119) effect(1106) posit(819) }
{ state(1844) use(1261) util(961) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }
{ model(3404) distribut(989) bayesian(671) }
{ data(1737) use(1416) pattern(1282) }
{ measur(2081) correl(1212) valu(896) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ time(1939) patient(1703) rate(768) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ activ(1452) weight(1219) physic(1104) }

Resumo

CKGROUND: Recent large scale deployments of health information technology have created opportunities for the integration of patient medical records with disparate public health, human service, and educational databases to provide comprehensive information related to health and development. Data integration techniques, which identify records belonging to the same individual that reside in multiple data sets, are essential to these efforts. Several algorithms have been proposed in the literatures that are adept in integrating records from two different datasets. Our algorithms are aimed at integrating multiple (in particular more than two) datasets efficiently.METHODS: Hierarchical clustering based solutions are used to integrate multiple (in particular more than two) datasets. Edit distance is used as the basic distance calculation, while distance calculation of common input errors is also studied. Several techniques have been applied to improve the algorithms in terms of both time and space: 1) Partial Construction of the Dendrogram (PCD) that ignores the level above the threshold; 2) Ignoring the Dendrogram Structure (IDS); 3) Faster Computation of the Edit Distance (FCED) that predicts the distance with the threshold by upper bounds on edit distance; and 4) A pre-processing blocking phase that limits dynamic computation within each block.RESULTS: We have experimentally validated our algorithms on large simulated as well as real data. Accuracy and completeness are defined stringently to show the performance of our algorithms. In addition, we employ a four-category analysis. Comparison with FEBRL shows the robustness of our approach.CONCLUSIONS: In the experiments we conducted, the accuracy we observed exceeded 90% for the simulated data in most cases. 97.7% and 98.1% accuracy were achieved for the constant and proportional threshold, respectively, in a real dataset of 1,083,878 records.

Resumo Limpo

ckground recent larg scale deploy health inform technolog creat opportun integr patient medic record dispar public health human servic educ databas provid comprehens inform relat health develop data integr techniqu identifi record belong individu resid multipl data set essenti effort sever algorithm propos literatur adept integr record two differ dataset algorithm aim integr multipl particular two dataset efficientlymethod hierarch cluster base solut use integr multipl particular two dataset edit distanc use basic distanc calcul distanc calcul common input error also studi sever techniqu appli improv algorithm term time space partial construct dendrogram pcd ignor level threshold ignor dendrogram structur id faster comput edit distanc fced predict distanc threshold upper bound edit distanc preprocess block phase limit dynam comput within blockresult experiment valid algorithm larg simul well real data accuraci complet defin stringent show perform algorithm addit employ fourcategori analysi comparison febrl show robust approachconclus experi conduct accuraci observ exceed simul data case accuraci achiev constant proport threshold respect real dataset record

Resumos Similares

AMIA Annu Symp Proc - Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles. ( 0,7208002892623 )
AMIA Annu Symp Proc - Using hierarchical mixture of experts model for fusion of outbreak detection methods. ( 0,712671479289008 )
IEEE Trans Image Process - Evaluating combinational illumination estimation methods on real-world images. ( 0,704009175938186 )
IEEE Trans Pattern Anal Mach Intell - A Link-Based Approach to the Cluster Ensemble Problem. ( 0,674302907846751 )
IEEE Trans Pattern Anal Mach Intell - Semi-Supervised Kernel Mean Shift Clustering. ( 0,673266094115286 )
Int J Neural Syst - Adaptive k-means algorithm for overlapped graph clustering. ( 0,672798581256796 )
Comput. Aided Surg. - The Equidistant Method - a novel hip joint simulation algorithm for detection of femoroacetabular impingement. ( 0,667811694502947 )
Int J Health Geogr - A binary-based approach for detecting irregularly shaped clusters. ( 0,666341683808062 )
Int J Health Geogr - Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: a case study on murine typhus in south Texas. ( 0,661460102525041 )
J Biomed Inform - Quantifying the determinants of outbreak detection performance through simulation and machine learning. ( 0,656691041748272 )
IEEE Trans Vis Comput Graph - GPU-based Multilevel Clustering. ( 0,654199793003244 )
Neural Comput - Spontaneous clustering via minimum -divergence. ( 0,652087595252632 )
Int J Health Geogr - Detection of clusters of a rare disease over a large territory: performance of cluster detection methods. ( 0,649601052194161 )
Comput Math Methods Med - A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation. ( 0,646993409662617 )
Int J Health Geogr - Detecting activity locations from raw GPS data: a novel kernel-based algorithm. ( 0,63757877983204 )
J Chem Inf Model - Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. ( 0,6332500053746 )
Comput Math Methods Med - Liver segmentation based on Snakes Model and improved GrowCut algorithm in abdominal CT image. ( 0,631036142478497 )
Spat Spatiotemporal Epidemiol - Optimal selection of the spatial scan parameters for cluster detection: a simulation study. ( 0,630795079742517 )
IEEE Trans Pattern Anal Mach Intell - Multi-Exemplar Affinity Propagation. ( 0,629758435406749 )
Artif Intell Med - Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection. ( 0,627994544322578 )
Artif Intell Med - Missing data imputation using statistical and machine learning methods in a real breast cancer problem. ( 0,624215305801299 )
Comput. Biol. Med. - A novel region-based level set method initialized with mean shift clustering for automated medical image segmentation. ( 0,622617766318431 )
IEEE Trans Neural Netw Learn Syst - Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. ( 0,620533183297531 )
Methods Inf Med - Visual clustering analysis of CIS logs to inform creation of a user-configurable Web CIS interface. ( 0,618303885115873 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,617559876863948 )
J Chem Inf Model - Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. ( 0,606261056601463 )
J Am Med Inform Assoc - Privacy-preserving heterogeneous health data sharing. ( 0,60033681415439 )
Med Decis Making - Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. ( 0,599390389553817 )
J Chem Inf Model - Investigation of the use of spectral clustering for the analysis of molecular data. ( 0,592571907617573 )
Int J Health Geogr - Using statistical methods and genotyping to detect tuberculosis outbreaks. ( 0,591229399406903 )
Int J Comput Assist Radiol Surg - A Hessian-based filter for vascular segmentation of noisy hepatic CT scans. ( 0,591129621265251 )
AMIA Annu Symp Proc - Patient clustering with uncoded text in electronic medical records. ( 0,586557626325278 )
Int J Comput Assist Radiol Surg - Fast lung nodule detection in chest CT images using cylindrical nodule-enhancement filter. ( 0,586019835919424 )
Comput Methods Programs Biomed - Fuzzy and hard clustering analysis for thyroid disease. ( 0,581794887714239 )
J Integr Bioinform - Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. ( 0,580202028200392 )
Med Decis Making - Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials. ( 0,579386093277618 )
Comput Math Methods Med - Novel harmonic regularization approach for variable selection in Cox's proportional hazards model. ( 0,579299753134496 )
Neural Comput - A nonparametric clustering algorithm with a quantile-based likelihood estimator. ( 0,579154342947408 )
J Med Syst - Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets. ( 0,578425451172651 )
Comput. Biol. Med. - Evaluation of automatic feature detection algorithms in EEG: application to interburst intervals. ( 0,572460863878016 )
J Biomed Inform - Learning Bayesian networks from survival data using weighting censored instances. ( 0,572027715244282 )
Int J Neural Syst - A genetic graph-based approach for partitional clustering. ( 0,571321206320997 )
Comput Methods Programs Biomed - Comparison of machine learning methods for classifying aphasic and non-aphasic speakers. ( 0,571012427206617 )
Comput Math Methods Med - Decimative spectral estimation with unconstrained model order. ( 0,569056040681168 )
J Integr Bioinform - An evolutionary and visual framework for clustering of DNA microarray data. ( 0,568948299508882 )
AMIA Annu Symp Proc - Survival prediction and treatment recommendation with Bayesian techniques in lung cancer. ( 0,567437280551071 )
Neural Comput - Feature selection for ordinal text classification. ( 0,567204023501358 )
BMC Med Inform Decis Mak - CMDX?-based single source information system for simplified quality management and clinical research in prostate cancer. ( 0,566533775946716 )
Comput Methods Programs Biomed - Generating correlated discrete ordinal data using R and SAS IML. ( 0,566531431809554 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,565474962856468 )
J Biomed Inform - Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. ( 0,565281643227121 )
Comput Math Methods Med - Recent progress on the factorization method for electrical impedance tomography. ( 0,564598647944615 )
Int J Comput Assist Radiol Surg - Preclinical feasibility of a technology framework for MRI-guided iliac angioplasty. ( 0,561773678161787 )
Comput. Biol. Med. - Analysis of adductors angle measurement in Hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. ( 0,561190305087628 )
BMC Med Inform Decis Mak - The effect of improving task representativeness on capturing nurses' risk assessment judgements: a comparison of written case simulations and physical simulations. ( 0,560474581855233 )
J. Med. Internet Res. - Security analysis and improvements to the PsychoPass method. ( 0,556087110538415 )
J Chem Inf Model - Benchmark data sets for structure-based computational target prediction. ( 0,554626568545664 )
J Chem Inf Model - DEKOIS: demanding evaluation kits for objective in silico screening--a versatile tool for benchmarking docking programs and scoring functions. ( 0,554390648111081 )
J. Comput. Biol. - A geometric clustering algorithm with applications to structural data. ( 0,553581528754666 )
Med Decis Making - Cost-saving tree-structured survival analysis for hip fracture of study of osteoporotic fractures data. ( 0,553193798101134 )
IEEE Trans Image Process - Multiple kernel sparse representations for supervised and unsupervised learning. ( 0,552850399616782 )
Comput Methods Programs Biomed - OLYMPUS: an automated hybrid clustering method in time series gene expression. Case study: host response after Influenza A (H1N1) infection. ( 0,551468564770377 )
IEEE Trans Pattern Anal Mach Intell - A Framework for Automatic Modeling from Pointcloud Data. ( 0,551050364544193 )
J Chem Inf Model - String kernels and high-quality data set for improved prediction of kinked helices in a-helical membrane proteins. ( 0,549573298213955 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,549268543980985 )
Comput Math Methods Med - A new particle swarm optimization-based method for phase unwrapping of MRI data. ( 0,547746135239514 )
Int J Comput Assist Radiol Surg - CT dataset anisotropy management for oral implantology planning software. ( 0,546323090356663 )
IEEE Trans Image Process - Data-dependent hashing based on p-stable distribution. ( 0,546148974797325 )
IEEE Trans Image Process - Linear discriminant analysis based on L1-norm maximization. ( 0,54607190491261 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,544942024867564 )
IEEE Trans Image Process - In-plane rotation and scale invariant clustering using dictionaries. ( 0,544805259854158 )
J Biomed Inform - A semantic framework to protect the privacy of electronic health records with non-numerical attributes. ( 0,54210539246602 )
Comput Methods Programs Biomed - fMRI analysis on the GPU-possibilities and challenges. ( 0,541977690518417 )
IEEE J Biomed Health Inform - Red blood cell cluster separation from digital images for use in sickle cell disease. ( 0,540545689133793 )
J Biomed Inform - Visual grids for managing data completeness in clinical research datasets. ( 0,540375765296761 )
IEEE Trans Image Process - Subspaces indexing model on Grassmann manifold for image search. ( 0,538816038688087 )
Comput Methods Programs Biomed - Simple methods for segmentation and measurement of diabetic retinopathy lesions in retinal fundus images. ( 0,537360262205577 )
Comput. Biol. Med. - A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients. ( 0,535641086982912 )
Int J Comput Assist Radiol Surg - Rapid image recognition of body parts scanned in computed tomography datasets. ( 0,532350261781267 )
Comput Biol Chem - Mode of action classification of chemicals using multi-concentration time-dependent cellular response profiles. ( 0,531012114731152 )
Brief. Bioinformatics - A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. ( 0,530395287807025 )
Int J Med Robot - Coordinated control and experimentation of the dental arch generator of the tooth-arrangement robot. ( 0,527220460517372 )
Brief. Bioinformatics - A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. ( 0,52654178232442 )
Comput Methods Programs Biomed - A state of the art review on intima-media thickness (IMT) measurement and wall segmentation techniques for carotid ultrasound. ( 0,52499201414491 )
J Integr Bioinform - Parallel Niche Pareto AlineaGA--an evolutionary multiobjective approach on multiple sequence alignment. ( 0,521870898712979 )
Brief. Bioinformatics - Accounting for noise when clustering biological data. ( 0,521022364515353 )
J. Comput. Biol. - Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data. ( 0,520447943449704 )
Comput Biol Chem - Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. ( 0,519094597890648 )
IEEE Trans Neural Netw Learn Syst - Fick's Law Assisted Propagation for Semisupervised Learning. ( 0,519042701158325 )
IEEE J Biomed Health Inform - Identifying Similar Cases in Document Networks using Cross-reference Structures. ( 0,518787646583145 )
Comput. Biol. Med. - Probing the existence of medium pulmonary crackles via model-based clustering. ( 0,51780748928994 )
IEEE Trans Neural Netw Learn Syst - Learning Stable Multilevel Dictionaries for Sparse Representations. ( 0,516164892764897 )
J Biomed Inform - Statistical file matching of flow cytometry data. ( 0,516076033036931 )
Spat Spatiotemporal Epidemiol - Performance of cancer cluster Q-statistics for case-control residential histories. ( 0,515301004211195 )
J Chem Inf Model - Visualization of molecular fingerprints. ( 0,514726297154871 )
IEEE J Biomed Health Inform - Content Based Image Retrieval by Metric Learning from Radiology Reports: Application to Interstitial Lung Diseases. ( 0,513889900420458 )
IEEE Trans Image Process - Efficiently learning a detection cascade with sparse eigenvectors. ( 0,512780973925132 )
Comput Methods Programs Biomed - Automated detection of endotracheal tubes in paediatric chest radiographs. ( 0,512583697128947 )
J Chem Inf Model - Consensus methods for combining multiple clusterings of chemical structures. ( 0,51247738992926 )
J Med Syst - Employing post-DEA cross-evaluation and cluster analysis in a sample of Greek NHS hospitals. ( 0,51150262240192 )