J Am Med Inform Assoc - Active learning for clinical text classification: is it better than random sampling?

Tópicos

{ learn(2355) train(1041) set(1003) }
{ compound(1573) activ(1297) structur(1058) }
{ algorithm(1844) comput(1787) effici(935) }
{ measur(2081) correl(1212) valu(896) }
{ imag(2830) propos(1344) filter(1198) }
{ studi(1119) effect(1106) posit(819) }
{ group(2977) signific(1463) compar(1072) }
{ general(901) number(790) one(736) }
{ extract(1171) text(1153) clinic(932) }
{ model(3404) distribut(989) bayesian(671) }
{ case(1353) use(1143) diagnosi(1136) }
{ model(2341) predict(2261) use(1141) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ can(981) present(881) function(850) }
{ studi(1410) differ(1259) use(1210) }
{ record(1888) medic(1808) patient(1693) }
{ high(1669) rate(1365) level(1280) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ imag(1057) registr(996) error(939) }
{ studi(2440) review(1878) systemat(933) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ error(1145) method(1030) estim(1020) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(1557) propos(1049) approach(1037) }
{ care(1570) inform(1187) nurs(1089) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ first(2504) two(1366) second(1323) }
{ patient(1821) servic(1111) care(1106) }
{ use(976) code(926) identifi(902) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ gene(2352) biolog(1181) express(1162) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

JECTIVE: This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks.DESIGN: Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results.MEASUREMENTS: Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences.RESULTS: The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm.CONCLUSION: For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.

Resumo Limpo

jectiv studi explor activ learn algorithm way reduc requir larg train set medic text classif tasksdesign three exist activ learn algorithm distancebas dist diversitybas div combin cmb use classifi text five dataset perform algorithm compar passiv learn five dataset conduct novel investig interact dataset characterist perform resultsmeasur classif accuraci area receiv oper characterist roc curv algorithm differ sampl size generat perform activ learn algorithm compar passiv learn use weight mean pair differ determin perform vari differ dataset measur divers uncertainti dataset use relat entropi correl result perform differencesresult dist cmb algorithm perform better passiv learn statist signific level set dist outperform passiv learn five dataset cmb found better passiv learn four dataset found strong correl dataset divers div perform well dataset uncertainti perform dist algorithmconclus medic text classif appropri activ learn algorithm can yield perform compar passiv learn consider smaller train set particular result suggest div perform better data higher divers dist data lower uncertainti

Resumos Similares

IEEE Trans Image Process - A linear support higher-order tensor machine for classification. ( 0,810084382023412 )
IEEE Trans Image Process - Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion. ( 0,807296678588365 )
J Chem Inf Model - Modeling and benchmark data set for the inhibition of c-Jun N-terminal kinase-3. ( 0,801882073972887 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,793882429619027 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,793768878720796 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,790471980595132 )
Neural Comput - Multiple spectral kernel learning and a gaussian complexity computation. ( 0,790235923402989 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,788370413747088 )
IEEE Trans Image Process - Hyperspectral image classification through bilayer graph-based learning. ( 0,786405935747612 )
AMIA Annu Symp Proc - Comparison and combination of several MeSH indexing approaches. ( 0,785873032023239 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,785772056785471 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,784161815392444 )
IEEE Trans Pattern Anal Mach Intell - Distance-Based Image Classification: Generalizing to New Classes at Near Zero Cost. ( 0,77717609463932 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,777025073633358 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,763946046255584 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,759163080073395 )
IEEE Trans Neural Netw Learn Syst - A Kernel Classification Framework for Metric Learning. ( 0,757945393745719 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,754744531200125 )
IEEE Trans Image Process - Manifold regularized multitask learning for semi-supervised multilabel image classification. ( 0,744656709646477 )
Neural Comput - Computing sparse representations of multidimensional signals using Kronecker bases. ( 0,743302974867411 )
Neural Comput - Unsupervised learning of generative and discriminative weights encoding elementary image components in a predictive coding model of cortical function. ( 0,742146228082455 )
Int J Neural Syst - Linear time relational prototype based learning. ( 0,741927851907708 )
Int J Neural Syst - Online semi-supervised growing neural gas. ( 0,741758275076886 )
Neural Comput - Online learning with (multiple) kernels: a review. ( 0,740597907811431 )
IEEE Trans Image Process - Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. ( 0,740528034678844 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,739758211999585 )
Int J Neural Syst - Span: spike pattern association neuron for learning spatio-temporal spike patterns. ( 0,737145515071517 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,732726895439945 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,732003387792114 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,730883446680933 )
Neural Comput - Reduction from cost-sensitive ordinal ranking to weighted binary classification. ( 0,730220124772973 )
Comput Methods Programs Biomed - Multistage approach for clustering and classification of ECG data. ( 0,727463345581934 )
IEEE Trans Image Process - Structured max-margin learning for inter-related classifier training and multilabel image annotation. ( 0,719377170193025 )
Neural Comput - Representing objects, relations, and sequences. ( 0,718974459640578 )
IEEE Trans Image Process - Task-specific image partitioning. ( 0,718875220933076 )
J Chem Inf Model - Note on naive Bayes based on binary descriptors in cheminformatics. ( 0,71771635885314 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,716892833435914 )
J Biomed Inform - Learning classification models from multiple experts. ( 0,714004173802487 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,712933208105349 )
IEEE Trans Image Process - Multiview Hessian regularization for image annotation. ( 0,710379901216803 )
J Biomed Inform - Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms. ( 0,708092161902614 )
J Biomed Inform - Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosis. ( 0,706375783620111 )
IEEE Trans Image Process - Unsupervised amplitude and texture classification of SAR images with multinomial latent model. ( 0,70510499325927 )
Neural Comput - Metacognitive learning in a fully complex-valued radial basis function neural network. ( 0,702507054912283 )
J Biomed Inform - Class proximity measures--dissimilarity-based classification and display of high-dimensional data. ( 0,700951208321071 )
IEEE Trans Pattern Anal Mach Intell - Representation Learning: A Review and New Perspectives. ( 0,699658724043736 )
IEEE Trans Image Process - Improving Web image search by bag-based reranking. ( 0,699149793665796 )
IEEE Trans Pattern Anal Mach Intell - Weakly Supervised Recognition of Daily Life Activities with Wearable Sensors. ( 0,698957032990554 )
Neural Comput - Divergence-based vector quantization. ( 0,697874534097581 )
AMIA Annu Symp Proc - Learning medical diagnosis models from multiple experts. ( 0,696173667359691 )
Neural Comput - Mismatched training and test distributions can outperform matched ones. ( 0,692837602325992 )
Comput Methods Programs Biomed - Biomedical system based on the Discrete Hidden Markov Model using the Rocchio-Genetic approach for the classification of internal carotid artery Doppler signals. ( 0,690939822814472 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,690315967564124 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,687471089489572 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,683021430717291 )
Neural Comput - Incremental learning by message passing in hierarchical temporal memory. ( 0,676380230291063 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,675270257399898 )
J. Comput. Biol. - Locally learning biomedical data using diffusion frames. ( 0,674122321614693 )
Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. ( 0,67043011483555 )
IEEE Trans Image Process - Artistic image analysis using graph-based learning approaches. ( 0,670028444206003 )
IEEE Trans Pattern Anal Mach Intell - Facial Age Estimation by Learning from Label Distributions. ( 0,666324969239476 )
IEEE Trans Pattern Anal Mach Intell - Learning Categories from Few Examples with Multi Model Knowledge Transfer. ( 0,663154370016438 )
IEEE Trans Image Process - Learning discriminative dictionary for group sparse representation. ( 0,661844340350419 )
J Chem Inf Model - An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs. ( 0,661383658379342 )
J Chem Inf Model - Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening. ( 0,660664935738814 )
AMIA Annu Symp Proc - Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis. ( 0,660322876581851 )
Comput. Biol. Med. - Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors. ( 0,65935556770217 )
IEEE Trans Image Process - Supervised ordering in IRp: application to morphological processing of hyperspectral images. ( 0,654660679268822 )
IEEE Trans Image Process - Design of non-linear kernel dictionaries for object recognition. ( 0,65368790078255 )
J Biomed Inform - Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. ( 0,649506528282875 )
IEEE Trans Pattern Anal Mach Intell - The Effect of Model Misspecification on Semi-Supervised Classification. ( 0,647955612578366 )
Artif Intell Med - Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. ( 0,646575086220687 )
J Chem Inf Model - Classifying large chemical data sets: using a regularized potential function method. ( 0,645270641785004 )
Neural Comput - On nonnegative matrix factorization algorithms for signal-dependent noise with application to electromyography data. ( 0,641397102263035 )
IEEE Trans Pattern Anal Mach Intell - Label Consistent K-SVD: Learning A Discriminative Dictionary for Recognition. ( 0,640055647555668 )
AMIA Annu Symp Proc - Classification of medication status change in clinical narratives. ( 0,639164310241634 )
Comput Methods Programs Biomed - Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface. ( 0,637868182539866 )
Comput. Biol. Med. - EEG-based emotion estimation using Bayesian weighted-log-posterior function and perceptron convergence algorithm. ( 0,636680961738548 )
J Biomed Inform - Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. ( 0,633286031086132 )
IEEE Trans Neural Netw Learn Syst - Ordinal Distance Metric Learning for Image Ranking. ( 0,633052615717559 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,633041498133471 )
Int J Comput Assist Radiol Surg - Statistical shape model of a liver for autopsy imaging. ( 0,631375599454172 )
IEEE Trans Image Process - Self-supervised online metric learning with low rank constraint for scene categorization. ( 0,626871196099371 )
IEEE Trans Neural Netw Learn Syst - An efficient topological distance-based tree kernel. ( 0,625408483691749 )
IEEE Trans Image Process - Incremental training of a detector using online sparse eigendecomposition. ( 0,624163723008139 )
Artif Intell Med - A classifier ensemble approach for the missing feature problem. ( 0,623288240279013 )
IEEE Trans Image Process - Saliency and gist features for target detection in satellite images. ( 0,62102181825282 )
IEEE Trans Image Process - Fast bilateral filter with arbitrary range and domain kernels. ( 0,620091047972805 )
BMC Med Inform Decis Mak - Decision tree-based learning to predict patient controlled analgesia consumption and readjustment. ( 0,616120256604363 )
IEEE Trans Neural Netw Learn Syst - Partially shared latent factor learning with multiview data. ( 0,615936397919872 )
J Am Med Inform Assoc - Supervised machine learning and active learning in classification of radiology reports. ( 0,613996619353997 )
IEEE Trans Neural Netw Learn Syst - Application of Reinforcement Learning Algorithms for the Adaptive Computation of the Smoothing Parameter for Probabilistic Neural Network. ( 0,612453999964897 )
Neural Comput - Adaptive multiclass classification for brain computer interfaces. ( 0,612089567959478 )
J Chem Inf Model - Anatomy of high-performance 2D similarity calculations. ( 0,610556136449174 )
Artif Intell Med - Exploiting the systematic review protocol for classification of medical abstracts. ( 0,610321296924412 )
J Chem Inf Model - Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models. ( 0,609818895080629 )
IEEE Trans Pattern Anal Mach Intell - Scene-Specific Pedestrian Detection for Static Video Surveillance. ( 0,60949294409932 )
AMIA Annu Symp Proc - Sample-efficient learning with auxiliary class-label information. ( 0,609125981581986 )
J Biomed Inform - Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: an application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39). ( 0,609068970093641 )
J Am Med Inform Assoc - Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks. ( 0,607877845144773 )