Int J Med Inform - An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.

Tópicos

{ learn(2355) train(1041) set(1003) }
{ search(2224) databas(1162) retriev(909) }
{ featur(3375) classif(2383) classifi(1994) }
{ sampl(1606) size(1419) use(1276) }
{ design(1359) user(1324) use(1319) }
{ model(2341) predict(2261) use(1141) }
{ perform(1367) use(1326) method(1137) }
{ gene(2352) biolog(1181) express(1162) }
{ use(976) code(926) identifi(902) }
{ activ(1452) weight(1219) physic(1104) }
{ detect(2391) sensit(1101) algorithm(908) }
{ extract(1171) text(1153) clinic(932) }
{ case(1353) use(1143) diagnosi(1136) }
{ time(1939) patient(1703) rate(768) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ research(1218) medic(880) student(794) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ activ(1138) subject(705) human(624) }
{ can(774) often(719) complex(702) }
{ inform(2794) health(2639) internet(1427) }
{ chang(1828) time(1643) increas(1301) }
{ method(1557) propos(1049) approach(1037) }
{ care(1570) inform(1187) nurs(1089) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ record(1888) medic(1808) patient(1693) }
{ monitor(1329) mobil(1314) devic(1160) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ use(1733) differ(960) four(931) }
{ survey(1388) particip(1329) question(1065) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ import(1318) role(1303) understand(862) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

RPOSE: Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles.METHODS: Na?ve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers.RESULTS: Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the na?ve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period.CONCLUSIONS: The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages.

Resumo Limpo

rpose earli detect infecti diseas outbreak crucial protect public health societi onlin news articl provid time inform diseas outbreak worldwid studi investig autom detect articl relev diseas outbreak use machin learn classifi reallif set expens prepar train data set classifi usual consist manual label relev irrelev articl mitig challeng examin use random sampl unlabel articl well label relev articlesmethod nave bay support vector machin svm classifi train relev random sampl unlabel articl divers classifi train vari number sampl unlabel articl also number word featur train classifi appli thousand articl publish day toprank articl classifi pool result set articl review expert analyst evalu classifiersresult daili averag area roc curv auc day evalu period respect nave bay svm classifi referenc databas diseas outbreak report confirm evalu data set result pool method inde cover incid record databas evalu periodconclus propos text classif framework util random sampl unlabel articl can facilit costeffect approach train machin learn classifi reallif internetbas biosurveil project plan examin framework use larger data set use articl nonenglish languag

Resumos Similares

J Integr Bioinform - The LAILAPS search engine: relevance ranking in life science databases. ( 0,728749361305888 )
IEEE Trans Image Process - Coaching the exploration and exploitation in active learning for interactive video retrieval. ( 0,699454637990666 )
J. Med. Internet Res. - Development and validation of filters for the retrieval of studies of clinical examination from Medline. ( 0,691503374070523 )
Brief. Bioinformatics - Fast and efficient searching of biological data resources--using EB-eye. ( 0,681676383105018 )
AMIA Annu Symp Proc - A bottom-up approach to MEDLINE indexing recommendations. ( 0,677346276919485 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,668629080897507 )
J Integr Bioinform - Classification methods for finding articles describing protein-protein interactions in PubMed. ( 0,667113713391718 )
J Integr Bioinform - The LAILAPS search engine: a feature model for relevance ranking in life science databases. ( 0,666618390500043 )
BMC Med Inform Decis Mak - Glomerular disease search filters for Pubmed, Ovid Medline, and Embase: a development and validation study. ( 0,663003564447272 )
Neural Comput - Divergence-based vector quantization. ( 0,652471671549904 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,642890671924972 )
J Chem Inf Model - Classifying large chemical data sets: using a regularized potential function method. ( 0,642198318988637 )
IEEE Trans Neural Netw Learn Syst - Kernel association for classification and prediction: a survey. ( 0,641139033575658 )
J Am Med Inform Assoc - Search filters to identify geriatric medicine in Medline. ( 0,638924384355044 )
J Am Med Inform Assoc - Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters. ( 0,634965542010277 )
Comput. Biol. Med. - Relabeling algorithm for retrieval of noisy instances and improving prediction quality. ( 0,634422847416279 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,634211508494259 )
Health Info Libr J - Medical literature searches: a comparison of PubMed and Google Scholar. ( 0,630789302022179 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,630627111778265 )
J Biomed Inform - Reducing systematic review workload through certainty-based screening. ( 0,630306766200933 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,629814992394302 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,627717982659152 )
Telemed J E Health - MEDLINE versus EMBASE and CINAHL for telemedicine searches. ( 0,627563484489729 )
AMIA Annu Symp Proc - Hyperdimensional computing approach to word sense disambiguation. ( 0,627355046917071 )
J Biomed Inform - On the query reformulation technique for effective MEDLINE document retrieval. ( 0,626643075020898 )
IEEE Trans Image Process - Improving Web image search by bag-based reranking. ( 0,626564009728769 )
IEEE Trans Image Process - Image search reranking with query-dependent click-based relevance feedback. ( 0,62569789283069 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,625015587660404 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,624481286871049 )
J Integr Bioinform - Evaluating the effect of unbalanced data in biomedical document classification. ( 0,622130316867824 )
J Biomed Inform - Reflective random indexing for semi-automatic indexing of the biomedical literature. ( 0,62126507739061 )
Neural Comput - Online learning with (multiple) kernels: a review. ( 0,617372351809873 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,617263503684325 )
IEEE Trans Pattern Anal Mach Intell - Learning Hierarchical Features for Scene Labeling. ( 0,615959672150089 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,615861965833032 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,615657106665213 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,613562262880674 )
AMIA Annu Symp Proc - Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. ( 0,613376867883028 )
J Chem Inf Model - Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening. ( 0,6132684901615 )
Comput. Biol. Med. - Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors. ( 0,613203007299459 )
Int J Med Inform - An analysis of clinical queries in an electronic health record search utility. ( 0,61204430038851 )
Artif Intell Med - A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. ( 0,609049482709507 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,608316462196222 )
J. Med. Internet Res. - Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews. ( 0,608316282775672 )
Comput Methods Programs Biomed - Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease. ( 0,608052853849344 )
AMIA Annu Symp Proc - Search filter precision can be improved by NOTing out irrelevant content. ( 0,607393585046935 )
IEEE Trans Pattern Anal Mach Intell - Weakly Supervised Recognition of Daily Life Activities with Wearable Sensors. ( 0,60714242110278 )
IEEE Trans Image Process - Learning discriminative dictionary for group sparse representation. ( 0,606874683308821 )
AMIA Annu Symp Proc - Evaluation of automated term groupings for detecting anaphylactic shock signals for drugs. ( 0,606135007797079 )
IEEE Trans Image Process - Multiple-kernel, multiple-instance similarity features for efficient visual object detection. ( 0,605326174514271 )
J. Med. Internet Res. - Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical searches. ( 0,604640336158551 )
Health Info Libr J - Utilisation of search filters in systematic reviews of prognosis questions. ( 0,604385593488719 )
J Am Med Inform Assoc - A literature search tool for intelligent extraction of disease-associated genes. ( 0,602702453444956 )
J Am Med Inform Assoc - MEDLINE clinical queries are robust when searching in recent publishing years. ( 0,601228582152354 )
IEEE Trans Image Process - A linear support higher-order tensor machine for classification. ( 0,60052818680416 )
Methods Inf Med - Learning the preferences of physicians for the organization of result lists of medical evidence articles. ( 0,598923997721423 )
IEEE Trans Image Process - Task-specific image partitioning. ( 0,595702839905908 )
Comput. Biol. Med. - A threshold fuzzy entropy based feature selection for medical database classification. ( 0,595415903442539 )
J Chem Inf Model - Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods. ( 0,594629854584232 )
IEEE Trans Neural Netw Learn Syst - A Kernel Classification Framework for Metric Learning. ( 0,594130151098576 )
Artif Intell Med - Prediction of intraoperative complexity from preoperative patient data for laparoscopic cholecystectomy. ( 0,593678670316996 )
Neural Comput - Feature selection for ordinal text classification. ( 0,591174656336499 )
Artif Intell Med - Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. ( 0,591093592106339 )
Neural Comput - Reduction from cost-sensitive ordinal ranking to weighted binary classification. ( 0,59109144531576 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,59106271444053 )
AMIA Annu Symp Proc - Improving predictions in imbalanced data using Pairwise Expanded Logistic Regression. ( 0,590328412375714 )
IEEE Trans Image Process - Saliency and gist features for target detection in satellite images. ( 0,588994754550008 )
J. Med. Internet Res. - Using Internet search engines to obtain medical information: a comparative study. ( 0,588239071152064 )
BMC Med Inform Decis Mak - Boolean versus ranked querying for biomedical systematic reviews. ( 0,588231323931705 )
Int J Health Geogr - HEALTH GeoJunction: place-time-concept browsing of health publications. ( 0,587939679385668 )
J Biomed Inform - Determining the difficulty of Word Sense Disambiguation. ( 0,587099421835577 )
Artif Intell Med - Exploiting the systematic review protocol for classification of medical abstracts. ( 0,586995107528889 )
J Am Med Inform Assoc - Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. ( 0,586573210915082 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,586372485563539 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,585425490301572 )
J Am Med Inform Assoc - Missing values in deduplication of electronic patient data. ( 0,585151559156903 )
J Biomed Inform - MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms. ( 0,58414734149064 )
IEEE Trans Image Process - 3D object retrieval with multitopic model combining relevance feedback and LDA model. ( 0,581198522536574 )
Int J Med Inform - MEDRank: using graph-based concept ranking to index biomedical texts. ( 0,581076772141937 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,580681235694563 )
IEEE J Biomed Health Inform - Content Based Image Retrieval by Metric Learning from Radiology Reports: Application to Interstitial Lung Diseases. ( 0,579778930299475 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,579364217818641 )
Health Info Libr J - Facilitating access to evidence: Primary Health Care Search Filter. ( 0,578360086259318 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,576278299678425 )
IEEE Trans Pattern Anal Mach Intell - A Bag-of-Features Framework to Classify Time Series. ( 0,576223800921925 )
IEEE Trans Image Process - Artistic image analysis using graph-based learning approaches. ( 0,575799751086553 )
IEEE Trans Pattern Anal Mach Intell - A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization. ( 0,574780962395505 )
BMC Med Inform Decis Mak - Publication trends of shared decision making in 15 high impact medical journals: a full-text review with bibliometric analysis. ( 0,574073541905917 )
Comput Methods Programs Biomed - Auto-adaptive robot-aided therapy using machine learning techniques. ( 0,574058960057212 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,571847787211061 )
BMC Med Inform Decis Mak - BOSS: context-enhanced search for biomedical objects. ( 0,571500968264031 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,571010848729817 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,570818245485967 )
J Telemed Telecare - How to improve your PubMed/MEDLINE searches: 1. background and basic searching. ( 0,568334324958234 )
AMIA Annu Symp Proc - Synonym, topic model and predicate-based query expansion for retrieving clinical documents. ( 0,567680917688532 )
Int J Neural Syst - Span: spike pattern association neuron for learning spatio-temporal spike patterns. ( 0,567108880901157 )
Health Info Libr J - Developing a geographic search filter to identify randomised controlled trials in Africa: finding the optimal balance between sensitivity and precision. ( 0,566752583882747 )
AMIA Annu Symp Proc - An automated approach for ranking journals to help in clinician decision support. ( 0,566464535766181 )
BMC Med Inform Decis Mak - Performance evaluation of Unified Medical Language System?'s synonyms expansion to query PubMed. ( 0,565714936020456 )
Health Info Libr J - The performance of adverse effects search filters in MEDLINE and EMBASE. ( 0,565443330160875 )