BMC Med Inform Decis Mak - Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach.


{ model(2341) predict(2261) use(1141) }
{ search(2224) databas(1162) retriev(909) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1219) similar(1157) match(930) }
{ first(2504) two(1366) second(1323) }
{ extract(1171) text(1153) clinic(932) }
{ howev(809) still(633) remain(590) }
{ health(3367) inform(1360) care(1135) }
{ data(3008) multipl(1320) sourc(1022) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }
{ imag(1947) propos(1133) code(1026) }
{ problem(2511) optim(1539) algorithm(950) }
{ general(901) number(790) one(736) }
{ studi(1410) differ(1259) use(1210) }
{ use(1733) differ(960) four(931) }
{ system(1976) rule(880) can(841) }
{ error(1145) method(1030) estim(1020) }
{ care(1570) inform(1187) nurs(1089) }
{ estim(2440) model(1874) function(577) }
{ can(774) often(719) complex(702) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ framework(1458) process(801) describ(734) }
{ risk(3053) factor(974) diseas(938) }
{ spatial(1525) area(1432) region(1030) }
{ use(976) code(926) identifi(902) }
{ survey(1388) particip(1329) question(1065) }
{ model(3404) distribut(989) bayesian(671) }
{ data(1737) use(1416) pattern(1282) }
{ bind(1733) structur(1185) ligand(1036) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ import(1318) role(1303) understand(862) }
{ perform(1367) use(1326) method(1137) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ implement(1333) system(1263) develop(1122) }
{ measur(2081) correl(1212) valu(896) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }


CKGROUND: Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error.RESULTS: Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1) a single best matching variable is found, (O2) more than one matching variable is found or (O3) not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration.CONCLUSION: Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when the numbers of datasets or variables to be included are large.

Resumo Limpo

ckground data requir govern donor intern communiti measur health develop achiev increas last decad dataset produc survey conduct sever countri year often combin analys time trend geograph pattern demograph health relat indic howev sinc dataset structur variabl definit code harmonis prior submit statist analys manual search renam recod variabl extrem tedious prone error task overal number dataset variabl larg articl present autom approach harmonis variabl name across sever dataset optimis search variabl minimis manual input reduc risk errorresult three consecut algorithm appli iter search variabl interest analys dataset first search captur particular case solv autom way search iter second search b run search produc hit identifi variabl label contain certain key term defin user search produc hit third one c run retriev variabl identifi survey illustr variabl interest output engin can o singl best match variabl found o one match variabl found o match variabl found output o solv user judgement exampl use four variabl present show search sensit specif second iterationconclus effici test autom algorithm use support harmonis process need analys multipl dataset especi relev number dataset variabl includ larg

Resumos Similares

AMIA Annu Symp Proc - Predicting clicks of PubMed articles. ( 0,66195484869402 )
Comput Methods Programs Biomed - Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. ( 0,660978421125945 )
J Am Med Inform Assoc - DCMDSM: a DICOM decomposed storage model. ( 0,649899446523835 )
J Am Med Inform Assoc - An improved model for predicting postoperative nausea and vomiting in ambulatory surgery patients using physician-modifiable risk factors. ( 0,646278448912069 )
J Biomed Inform - Prediction of influenza vaccination outcome by neural networks and logistic regression. ( 0,639799775230747 )
Brief. Bioinformatics - myMIR: a genome-wide microRNA targets identification and annotation tool. ( 0,638869114674698 )
BMC Med Inform Decis Mak - A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. ( 0,634491981091751 )
J Biomed Inform - Statistical process control for validating a classification tree model for predicting mortality--a novel approach towards temporal validation. ( 0,634387956594022 )
Artif Intell Med - Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. ( 0,633914473177837 )
J. Comput. Biol. - Prediction of siRNA potency using sparse logistic regression. ( 0,633286808366109 )
Appl Clin Inform - Comparing predictions made by a prediction model, clinical score, and physicians: pediatric asthma exacerbations in the emergency department. ( 0,630517135736273 )
J Biomed Inform - Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. ( 0,622734364553789 )
BMC Med Inform Decis Mak - Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population. ( 0,622397125659201 )
J Chem Inf Model - dREL: a relational expression language for dictionary methods. ( 0,614475704111867 )
BMC Med Inform Decis Mak - Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. ( 0,612693484808399 )
Spat Spatiotemporal Epidemiol - Modeling habitat suitability for occurrence of highly pathogenic avian influenza virus H5N1 in domestic poultry in Asia: a spatial multicriteria decision analysis approach. ( 0,612237758201885 )
Comput Math Methods Med - Variable selection in ROC regression. ( 0,612201845467426 )
Methods Inf Med - Limited sampling strategies to estimate the area under the concentration-time curve. Biases and a proposed more accurate method. ( 0,611026604180553 )
J Med Syst - Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. ( 0,610398281561033 )
Lifetime Data Anal - Understanding increments in model performance metrics. ( 0,609066683253836 )
Med Decis Making - Application of an artificial neural network to predict postinduction hypotension during general anesthesia. ( 0,606260742560118 )
Med Decis Making - Constructing proper ROCs from ordinal response data using weighted power functions. ( 0,60238254437094 )
Artif Intell Med - Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers. ( 0,600998651100473 )
Comput Biol Chem - Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions. ( 0,598781848518611 )
Neural Comput - An extension of the receiver operating characteristic curve and AUC-optimal classification. ( 0,597421670812109 )
Int J Med Inform - Application of data mining to the identification of critical factors in patient falls using a web-based reporting system. ( 0,592571115462298 )
BMC Med Inform Decis Mak - Use of outcomes to evaluate surveillance systems for bioterrorist attacks. ( 0,591613523908261 )
Brief. Bioinformatics - Adjusting confounders in ranking biomarkers: a model-based ROC approach. ( 0,589122292660341 )
Comput Math Methods Med - Iterative reweighted noninteger norm regularizing SVM for gene expression data classification. ( 0,588292092694506 )
Comput. Biol. Med. - A ternary model of decompression sickness in rats. ( 0,586720291020479 )
IEEE Trans Image Process - DEB: definite error bounded tangent estimator for digital curves. ( 0,583944491116276 )
J Med Syst - Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models. ( 0,583829206330797 )
Methods Inf Med - Technology-induced errors. The current use of frameworks and models from the biomedical and life sciences literatures. ( 0,581011832541253 )
IEEE J Biomed Health Inform - The effect of sample age and prediction resolution on myocardial infarction risk prediction. ( 0,580318326389201 )
BMC Med Inform Decis Mak - Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. ( 0,579424829232963 )
AMIA Annu Symp Proc - Predicting Surgical Risk: How Much Data is Enough? ( 0,579363019993569 )
BMC Med Inform Decis Mak - Evaluation of prediction models for the staging of prostate cancer. ( 0,579263532182056 )
Med Decis Making - A comparison of methods for converting DCE values onto the full health-dead QALY scale. ( 0,577722616330024 )
BMC Med Inform Decis Mak - BOSS: context-enhanced search for biomedical objects. ( 0,573725515023076 )
Methods Inf Med - Extending statistical boosting. An overview of recent methodological developments. ( 0,573076333834494 )
J Clin Monit Comput - Use of genetic programming, logistic regression, and artificial neural nets to predict readmission after coronary artery bypass surgery. ( 0,572790552504396 )
Med Decis Making - Contrasting two frameworks for ROC analysis of ordinal ratings. ( 0,571845425981559 )
J Chem Inf Model - Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models. ( 0,571694815918547 )
AMIA Annu Symp Proc - Comparing predictive models of glioblastoma multiforme built using multi-institutional and local data sources. ( 0,571458649052183 )
J Biomed Inform - Gene-disease association with literature based enrichment. ( 0,570831515651804 )
J Am Med Inform Assoc - Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods. ( 0,569620336569629 )
Methods Inf Med - Developing topic-specific search filters for PubMed with click-through data. ( 0,568399472572036 )
Comput. Biol. Med. - A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks. ( 0,566580417506672 )
Int J Med Inform - A systematic review of predictive modeling for bronchiolitis. ( 0,56643729545934 )
Med Decis Making - Lehmann family of ROC curves. ( 0,566175175052696 )
BMC Med Inform Decis Mak - A method for managing re-identification risk from small geographic areas in Canada. ( 0,565210549889909 )
J Biomed Inform - Small sum privacy and large sum utility in data publishing. ( 0,564869351026356 )
Comput Methods Programs Biomed - Prediction of postprandial blood glucose under uncertainty and intra-patient variability in type 1 diabetes: a comparative study of three interval models. ( 0,564783373022244 )
Comput. Biol. Med. - A leave-one-out cross-validation SAS macro for the identification of markers associated with survival. ( 0,564564198914499 )
Comput Math Methods Med - Modified logistic regression models using gene coexpression and clinical features to predict prostate cancer progression. ( 0,563581473299254 )
Comput Methods Programs Biomed - Development of a daily mortality probability prediction model from Intensive Care Unit patients using a discrete-time event history analysis. ( 0,562530124826939 )
AMIA Annu Symp Proc - Author keywords in biomedical journal articles. ( 0,562192334198569 )
Int J Health Geogr - Assessing the effects of variables and background selection on the capture of the tick climate niche. ( 0,561199985756802 )
Med Decis Making - A pilot study using machine learning and domain knowledge to facilitate comparative effectiveness review updating. ( 0,560896430124764 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,560096587877687 )
AMIA Annu Symp Proc - Clinical risk prediction by exploring high-order feature correlations. ( 0,559740322658795 )
Int J Health Geogr - Prediction of high-risk areas for visceral leishmaniasis using socioeconomic indicators and remote sensing data. ( 0,558788541519885 )
Comput. Biol. Med. - Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. ( 0,557276052996619 )
Artif Intell Med - Operation room tool handling and miscommunication scenarios: an object-process methodology conceptual model. ( 0,555473565201813 )
Methods Inf Med - Sensor-based fall risk assessment--an expert 'to go'. ( 0,554945069270775 )
J Integr Bioinform - Classification methods for finding articles describing protein-protein interactions in PubMed. ( 0,554733234864302 )
AMIA Annu Symp Proc - Finding and accessing diagrams in biomedical publications. ( 0,552674190133883 )
Med Decis Making - Performance profiling in primary care: does the choice of statistical model matter? ( 0,550939374189317 )
Comput Biol Chem - An ensemble method for prediction of conformational B-cell epitopes from antigen sequences. ( 0,550137741759348 )
BMC Med Inform Decis Mak - Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. ( 0,549375664664637 )
Int J Comput Assist Radiol Surg - Controlling motion prediction errors in radiotherapy with relevance vector machines. ( 0,547412606187764 )
IEEE Trans Image Process - Network-based H.264/AVC whole frame loss visibility model and frame dropping methods. ( 0,547238395188114 )
J Chem Inf Model - Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. ( 0,546386067086265 )
J. Med. Internet Res. - Cumulative query method for influenza surveillance using search engine data. ( 0,546310762645369 )
J Am Med Inform Assoc - Automating annotation of information-giving for analysis of clinical conversation. ( 0,543763477308748 )
IEEE Trans Pattern Anal Mach Intell - On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. ( 0,54374844144573 )
J Am Med Inform Assoc - A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews. ( 0,542874509692756 )
J Med Syst - Comparison of artificial neural networks with logistic regression for detection of obesity. ( 0,539938752030734 )
J Am Med Inform Assoc - Calibrating predictive model estimates to support personalized medicine. ( 0,539833336201451 )
IEEE Trans Image Process - Fast bi-directional prediction selection in H.264/MPEG-4 AVC temporal scalable video coding. ( 0,539241269937976 )
Brief. Bioinformatics - Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees. ( 0,539142849424521 )
Artif Intell Med - Machine learning of clinical performance in a pancreatic cancer database. ( 0,538498669274982 )
Int J Health Geogr - Modeling larval malaria vector habitat locations using landscape features and cumulative precipitation measures. ( 0,537717409460658 )
J Chem Inf Model - Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing. ( 0,537650486068638 )
Med Decis Making - Adaptation of clinical prediction models for application in local settings. ( 0,537246200834012 )
J Am Med Inform Assoc - A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. ( 0,53578828401331 )
Comput Methods Programs Biomed - Exploring an optimal vector autoregressive model for multi-channel pulmonary sound data. ( 0,534085061311422 )
J Integr Bioinform - The LAILAPS search engine: a feature model for relevance ranking in life science databases. ( 0,533672395992203 )
Methods Inf Med - Chi-square-based scoring function for categorization of MEDLINE citations. ( 0,533287755779955 )
J Chem Inf Model - Homology modeling of human muscarinic acetylcholine receptors. ( 0,532589034604375 )
Med Biol Eng Comput - A dynamic Bayesian network for estimating the risk of falls from real gait data. ( 0,532500754063751 )
J Am Med Inform Assoc - Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. ( 0,532182214583446 )
Med Decis Making - Modeling and validating the cost and clinical pathway of colorectal cancer. ( 0,531625224520397 )
J Biomed Inform - An empirical approach to model selection through validation for censored survival data. ( 0,531224681488724 )
Methods Inf Med - A probabilistic model to investigate the properties of prognostic tools for falls. ( 0,531033251708729 )
Med Decis Making - Appropriate evidence sources for populating decision analytic models within health technology assessment (HTA): a systematic review of HTA manuals and health economic guidelines. ( 0,529565163272112 )
Spat Spatiotemporal Epidemiol - Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. ( 0,529511935862575 )
IEEE Trans Image Process - Spatial sparsity-induced prediction (SIP) for images and video: a simple way to reject structured interference. ( 0,528704608442031 )
J Biomed Inform - Supporting retrieval of diverse biomedical data using evidence-aware queries. ( 0,528296292446719 )
BMC Med Inform Decis Mak - Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model. ( 0,528101270794345 )