BMC Med Inform Decis Mak - Predicting sample size required for classification performance.

Tópicos

{ sampl(1606) size(1419) use(1276) }
{ model(3480) simul(1196) paramet(876) }
{ method(1219) similar(1157) match(930) }
{ learn(2355) train(1041) set(1003) }
{ data(3963) clinic(1234) research(1004) }
{ extract(1171) text(1153) clinic(932) }
{ model(2341) predict(2261) use(1141) }
{ error(1145) method(1030) estim(1020) }
{ model(2656) set(1616) predict(1553) }
{ activ(1452) weight(1219) physic(1104) }
{ model(2220) cell(1177) simul(1124) }
{ state(1844) use(1261) util(961) }
{ time(1939) patient(1703) rate(768) }
{ decis(3086) make(1611) patient(1517) }
{ detect(2391) sensit(1101) algorithm(908) }
{ measur(2081) correl(1212) valu(896) }
{ algorithm(1844) comput(1787) effici(935) }
{ activ(1138) subject(705) human(624) }
{ inform(2794) health(2639) internet(1427) }
{ bind(1733) structur(1185) ligand(1036) }
{ patient(2315) diseas(1263) diabet(1191) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ design(1359) user(1324) use(1319) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ system(1050) medic(1026) inform(1018) }
{ studi(1119) effect(1106) posit(819) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ implement(1333) system(1263) develop(1122) }
{ method(2212) result(1239) propos(1039) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }

Resumo

CKGROUND: Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target.METHODS: We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method.RESULTS: A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p < 0.05).CONCLUSIONS: This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.

Resumo Limpo

ckground supervis learn method need annot data order generat effici model annot data howev relat scarc resourc can expens obtain passiv activ learn method need estim size annot sampl requir reach perform targetmethod design implement method fit invers power law model point given learn curv creat use small annot train set fit carri use nonlinear weight least squar optim fit model use predict classifi perform confid interv larger sampl size evalu nonlinear weight curv fit method appli set learn curv generat use clinic text waveform classif task activ passiv sampl method predict valid use standard good fit measur control use unweight fit methodresult total model fit model predict compar observ perform depend data set sampl method took annot sampl achiev mean averag root mean squar error result also show weight fit method outperform baselin unweight method p conclus paper describ simpl effect sampl size predict algorithm conduct weight fit learn curv algorithm outperform unweight algorithm describ previous literatur can help research determin annot sampl size supervis machin learn

Resumos Similares

IEEE Trans Image Process - Fast bilateral filter with arbitrary range and domain kernels. ( 0,599436009530406 )
Int J Med Inform - De-identification of clinical narratives through writing complexity measures. ( 0,585808283009648 )
J Biomed Inform - A method for determining the number of documents needed for a gold standard corpus. ( 0,573222868625496 )
Res Synth Methods - Trial sequential methods for meta-analysis. ( 0,557891134265093 )
Appl Clin Inform - What big size you have! Using effect sizes to determine the impact of public health nursing interventions. ( 0,557089899574402 )
Brief. Bioinformatics - On the classification of microarray gene-expression data. ( 0,542450371404658 )
IEEE Trans Pattern Anal Mach Intell - Weakly Supervised Recognition of Daily Life Activities with Wearable Sensors. ( 0,538990059177486 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,538054290028475 )
Med Biol Eng Comput - Power type strain energy function model and prediction of the anisotropic mechanical properties of skin using uniaxial extension data. ( 0,532239000102855 )
J Am Med Inform Assoc - Usability-driven pruning of large ontologies: the case of SNOMED CT. ( 0,529077777416165 )
J Biomed Inform - Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. ( 0,527375693084986 )
J Am Med Inform Assoc - BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing. ( 0,522758844932671 )
Comput Math Methods Med - Discrete-state stochastic models of calcium-regulated calcium influx and subspace dynamics are not well-approximated by ODEs that neglect concentration fluctuations. ( 0,520729467844131 )
Neural Comput - A connection between score matching and denoising autoencoders. ( 0,520290154631565 )
J Chem Inf Model - Testing physical models of passive membrane permeation. ( 0,512823641576093 )
Comput Biol Chem - Stochastic synchronization of interacting pathways in testosterone model. ( 0,503737668233059 )
J. Comput. Biol. - Rich parameterization improves RNA structure prediction. ( 0,503509524791446 )
J Biomed Inform - Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. ( 0,499243572579677 )
Med Decis Making - Test result-based sampling: an efficient design for estimating the accuracy of patient safety indicators. ( 0,493668605044186 )
AMIA Annu Symp Proc - Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning. ( 0,492169531419528 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,49076112117233 )
Neural Comput - Input statistics and Hebbian cross-talk effects. ( 0,490495003715068 )
Neural Comput - Mapping of visual receptive fields by tomographic reconstruction. ( 0,488378403859769 )
J Biomed Inform - Practical approach to determine sample size for building logistic prediction models using high-throughput data. ( 0,488137569812772 )
Comput. Biol. Med. - Large eddy simulation of the FDA benchmark nozzle for a Reynolds number of 6500. ( 0,487952580779237 )
Comput Math Methods Med - Error analysis of deep sequencing of phage libraries: peptides censored in sequencing. ( 0,486441024175537 )
Brief. Bioinformatics - Investigating biocomplexity through the agent-based paradigm. ( 0,486227041602007 )
Comput Biol Chem - Effective sample size: Quick estimation of the effect of related samples in genetic case-control association analyses. ( 0,483751321452634 )
IEEE Trans Image Process - Unsupervised amplitude and texture classification of SAR images with multinomial latent model. ( 0,482663639135242 )
Methods Inf Med - Influence of selection bias on the test decision. A simulation study. ( 0,48199236278096 )
J Chem Inf Model - Determining the degree of randomness of descriptors in linear regression equations with respect to the data size. ( 0,479924608138329 )
IEEE Trans Image Process - Retina verification system based on biometric graph matching. ( 0,479249099054077 )
IEEE Trans Pattern Anal Mach Intell - Animated Pose Templates for Modelling and Detecting Human Actions. ( 0,478031206437485 )
AMIA Annu Symp Proc - Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations. ( 0,476697270436773 )
Neural Comput - Expectation propagation with factorizing distributions: a Gaussian approximation and performance results for simple models. ( 0,474496863952607 )
IEEE Trans Pattern Anal Mach Intell - Learning Categories from Few Examples with Multi Model Knowledge Transfer. ( 0,47296631377798 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,471347641203271 )
IEEE Trans Neural Netw Learn Syst - Application of Reinforcement Learning Algorithms for the Adaptive Computation of the Smoothing Parameter for Probabilistic Neural Network. ( 0,469150498108106 )
Telemed J E Health - Measuring the effect of telecare on medical expenditures without bias using the propensity score matching method. ( 0,468741143163099 )
Methods Inf Med - A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection. ( 0,468251549274531 )
Artif Intell Med - Natural occurrence of nocturnal hypoglycemia detection using hybrid particle swarm optimized fuzzy reasoning model. ( 0,467904068698175 )
AMIA Annu Symp Proc - TextHunter--A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research. ( 0,467877400050786 )
Comput Methods Programs Biomed - Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data. ( 0,467257672919465 )
J Biomed Inform - Learning Bayesian networks for clinical time series analysis. ( 0,465699351215938 )
J Clin Monit Comput - Solutions to kinking of the side stream carbon dioxide sampling line. ( 0,465386455309718 )
J Am Med Inform Assoc - Using machine learning for concept extraction on clinical documents from multiple data sources. ( 0,464420881932504 )
J Biomed Inform - The need for harmonized structured documentation and chances of secondary use - results of a systematic analysis with automated form comparison for prostate and breast cancer. ( 0,462683991679578 )
AMIA Annu Symp Proc - Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis. ( 0,462196840413815 )
J Clin Monit Comput - Predictive data mining on monitoring data from the intensive care unit. ( 0,461716108767264 )
Neural Comput - Stability against fluctuations: scaling, bifurcations, and spontaneous symmetry breaking in stochastic models of synaptic plasticity. ( 0,461664990858785 )
J Biomed Inform - Identifying well-formed biomedical phrases in MEDLINE? text. ( 0,461590786325882 )
Comput. Biol. Med. - An implicit evolution scheme for active contours and surfaces based on IIR filtering. ( 0,4612615084678 )
Int J Comput Assist Radiol Surg - Liver tumors segmentation from CTA images using voxels classification and affinity constraint propagation. ( 0,45911256269287 )
Med Decis Making - Sample Size and Power When Designing a Randomized Trial for the Estimation of Treatment, Selection, and Preference Effects. ( 0,455657510824554 )
IEEE Trans Image Process - Sampling optimization for printer characterization by direct search. ( 0,454818525252949 )
AMIA Annu Symp Proc - Active Learning-based corpus annotation--the PathoJen experience. ( 0,453561621458437 )
IEEE Trans Image Process - Correlation-coefficient-based fast template matching through partial elimination. ( 0,452704075428217 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,452287418116175 )
Comput Methods Programs Biomed - Design of a framework for modeling, integration and simulation of physiological models. ( 0,450046971083647 )
Med Decis Making - The influence of graphic display format on the interpretations of quantitative risk information among adults with lower education and literacy: a randomized experimental study. ( 0,449080912993078 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,448571388543107 )
IEEE Trans Image Process - Additive log-logistic model for networked video quality assessment. ( 0,447399683465347 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,44712504544588 )
Comput Methods Programs Biomed - TGI-Simulator: a visual tool to support the preclinical phase of the drug discovery process by assessing in silico the effect of an anticancer drug. ( 0,4470370313897 )
Neural Comput - Toward unified hybrid simulation techniques for spiking neural networks. ( 0,446368877748893 )
Comput Methods Programs Biomed - The most precise computations using Euler's method in standard floating-point arithmetic applied to modelling of biological systems. ( 0,445494928802139 )
IEEE Trans Image Process - Sparse bayesian learning of filters for efficient image expansion. ( 0,443787140593343 )
J Am Med Inform Assoc - Calibrating predictive model estimates to support personalized medicine. ( 0,443708965728174 )
J Integr Bioinform - An advanced environment for hybrid modeling of biological systems based on modelica. ( 0,443646052238638 )
J Chem Inf Model - Electronic structure investigation and parametrization of biologically relevant iron-sulfur clusters. ( 0,442971923139476 )
Neural Comput - Statistical mechanics of reward-modulated learning in decision-making networks. ( 0,442814019010367 )
Comput Math Methods Med - Global hopf bifurcation on two-delays leslie-gower predator-prey system with a prey refuge. ( 0,44190118202202 )
Neural Comput - Learning the dynamics of objects by optimal functional interpolation. ( 0,441823593248702 )
Comput Math Methods Med - Numerical solutions for a model of tissue invasion and migration of tumour cells. ( 0,441356559165237 )
IEEE Trans Pattern Anal Mach Intell - Gaussian Process-Mixture Conditional Heteroscedasticity. ( 0,439159340385199 )
Comput Math Methods Med - General error analysis in the relationship between free thyroxine and thyrotropin and its clinical relevance. ( 0,438905647676321 )
Methods Inf Med - Prediction model for glucose metabolism based on lipid metabolism. ( 0,438718801351514 )
Comput. Biol. Med. - ODE/PDE analysis of corneal curvature. ( 0,438641789598588 )
Neural Comput - Integration of reinforcement learning and optimal decision-making theories of the basal ganglia. ( 0,437574642733331 )
Neural Comput - Dopamine ramps are a consequence of reward prediction errors. ( 0,437337470992463 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,436679028175522 )
Comput Methods Programs Biomed - Functionality of the baroreceptor nerves in heart rate regulation. ( 0,436674688296981 )
Neural Comput - On nonnegative matrix factorization algorithms for signal-dependent noise with application to electromyography data. ( 0,436582789720939 )
Comput Methods Programs Biomed - Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. ( 0,435921992748843 )
Artif Intell Med - Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children. ( 0,435753319708096 )
Comput. Biol. Med. - Feature selection for a cooperative coevolutionary classifier in liver fibrosis diagnosis. ( 0,435492801621488 )
Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. ( 0,43547018437035 )
Comput Methods Programs Biomed - A bootstrap approach for lower injury levels of the risk curves. ( 0,435381310884237 )
J Chem Inf Model - Extraction of protein binding pockets in close neighborhood of bound ligands makes comparisons simple due to inherent shape similarity. ( 0,43498728131227 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,434900028103745 )
J Am Med Inform Assoc - Induced lexico-syntactic patterns improve information extraction from online medical forums. ( 0,434599854205286 )
Neural Comput - Learning invariance from natural images inspired by observations in the primary visual cortex. ( 0,43419463319628 )
IEEE Trans Neural Netw Learn Syst - Ordinal Distance Metric Learning for Image Ranking. ( 0,433081601439503 )
Comput Math Methods Med - A note regarding problems with interaction and varying block sizes in a comparison of endotracheal tubes. ( 0,43281830925619 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,432525919616755 )
J Chem Inf Model - Exploiting structural information in patent specifications for key compound prediction. ( 0,432139016715065 )
AMIA Annu Symp Proc - Comparing predictive models of glioblastoma multiforme built using multi-institutional and local data sources. ( 0,43188683081611 )
AMIA Annu Symp Proc - Part-of-speech tagging for clinical text: wall or bridge between institutions? ( 0,430857729715431 )
Methods Inf Med - An easily implemented method for abbreviation expansion for the medical domain in Japanese text. A preliminary study. ( 0,430563554633924 )
IEEE Trans Vis Comput Graph - Turbulence Simulation by Adaptive Multi-Relaxation Lattice Boltzmann Modeling. ( 0,430321361832156 )