J Integr Bioinform - Evaluating the effect of unbalanced data in biomedical document classification.

Tópicos

{ search(2224) databas(1162) retriev(909) }
{ concept(1167) ontolog(924) domain(897) }
{ ehr(2073) health(1662) electron(1139) }
{ can(774) often(719) complex(702) }
{ learn(2355) train(1041) set(1003) }
{ algorithm(1844) comput(1787) effici(935) }
{ studi(1119) effect(1106) posit(819) }
{ featur(3375) classif(2383) classifi(1994) }
{ framework(1458) process(801) describ(734) }
{ perform(1367) use(1326) method(1137) }
{ assess(1506) score(1403) qualiti(1306) }
{ extract(1171) text(1153) clinic(932) }
{ general(901) number(790) one(736) }
{ model(3404) distribut(989) bayesian(671) }
{ data(1737) use(1416) pattern(1282) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ method(1557) propos(1049) approach(1037) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ monitor(1329) mobil(1314) devic(1160) }
{ gene(2352) biolog(1181) express(1162) }
{ measur(2081) correl(1212) valu(896) }
{ method(1219) similar(1157) match(930) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ data(1714) softwar(1251) tool(1186) }
{ method(984) reconstruct(947) comput(926) }
{ data(3963) clinic(1234) research(1004) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(2656) set(1616) predict(1553) }
{ activ(1138) subject(705) human(624) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ detect(2391) sensit(1101) algorithm(908) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ studi(2440) review(1878) systemat(933) }
{ treatment(1704) effect(941) patient(846) }
{ error(1145) method(1030) estim(1020) }
{ clinic(1479) use(1117) guidelin(835) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ model(3480) simul(1196) paramet(876) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(1733) differ(960) four(931) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.

Resumo Limpo

nowaday document classif becom interest research field part due increas avail biomed inform digit form necessari catalogu organ context machin learn techniqu usual appli text classif use general induct process automat build text classifi set preclassifi document relat domain imbalanc data wellknown problem mani practic applic knowledg discoveri effect perform standard classifi remark paper investig applic bayesian network bn model triag document repres associ differ mesh term result show bns adequ describ condit independ mesh term mesh ontolog valuabl resourc repres medlin document differ abstract level moreov perform extens experiment evalu investig classif medlin document use bn classifi pose addit challeng deal classimbalanc predict evalu involv two method undersampl costsensit learn conclud bn classifi sensit balanc strategi exist techniqu can improv overal perform

Resumos Similares

J Biomed Inform - Improving MeSH classification of biomedical articles using citation contexts. ( 0,7338902120289 )
BMC Med Inform Decis Mak - Information discovery on electronic health records using authority flow techniques. ( 0,720733959924806 )
J Biomed Inform - Improving search over Electronic Health Records using UMLS-based query expansion through random walks. ( 0,672945606763521 )
J. Med. Internet Res. - A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French. ( 0,638487496672912 )
Brief. Bioinformatics - Fast and efficient searching of biological data resources--using EB-eye. ( 0,635549440799286 )
J Am Med Inform Assoc - Terminology challenges implementing the HL7 context-aware knowledge retrieval ('Infobutton') standard. ( 0,626370575064217 )
Comput Methods Programs Biomed - BiOSS: A system for biomedical ontology selection. ( 0,624034691029818 )
J Integr Bioinform - The LAILAPS search engine: a feature model for relevance ranking in life science databases. ( 0,622302455351494 )
Int J Med Inform - An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. ( 0,622130316867824 )
AMIA Annu Symp Proc - Semantic MEDLINE for discovery browsing: using semantic predications and the literature-based discovery paradigm to elucidate a mechanism for the obesity paradox. ( 0,61953483874795 )
AMIA Annu Symp Proc - A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. ( 0,618832873275366 )
AMIA Annu Symp Proc - Dialect topic modeling for improved consumer medical search. ( 0,617299733785438 )
J Biomed Inform - An approach for the semantic interoperability of ISO EN 13606 and OpenEHR archetypes. ( 0,613893818047556 )
Artif Intell Med - Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies. ( 0,613784450417342 )
Brief. Bioinformatics - Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees. ( 0,612477464842557 )
J Biomed Inform - On the query reformulation technique for effective MEDLINE document retrieval. ( 0,60283142149969 )
BMC Med Inform Decis Mak - Boolean versus ranked querying for biomedical systematic reviews. ( 0,597016809294356 )
J Integr Bioinform - Classification methods for finding articles describing protein-protein interactions in PubMed. ( 0,593752267787583 )
Int J Health Geogr - HEALTH GeoJunction: place-time-concept browsing of health publications. ( 0,593049006310547 )
BMC Med Inform Decis Mak - Mining biomarker information in biomedical literature. ( 0,592203540047577 )
AMIA Annu Symp Proc - Query log analysis of an electronic health record search engine. ( 0,592096453789738 )
BMC Med Inform Decis Mak - CDAPubMed: a browser extension to retrieve EHR-based biomedical literature. ( 0,59171227535085 )
Int J Med Inform - MEDRank: using graph-based concept ranking to index biomedical texts. ( 0,590397654568403 )
BMC Med Inform Decis Mak - BOSS: context-enhanced search for biomedical objects. ( 0,589854378663605 )
Methods Inf Med - Chi-square-based scoring function for categorization of MEDLINE citations. ( 0,589319281300677 )
J Integr Bioinform - The LAILAPS search engine: relevance ranking in life science databases. ( 0,58637970204302 )
AMIA Annu Symp Proc - A semantic and syntactic text simplification tool for health content. ( 0,583872891931547 )
J Biomed Inform - A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine. ( 0,583503012272798 )
BMC Med Inform Decis Mak - Glomerular disease search filters for Pubmed, Ovid Medline, and Embase: a development and validation study. ( 0,5833454131044 )
J Biomed Inform - Reflective random indexing for semi-automatic indexing of the biomedical literature. ( 0,580639242561241 )
AMIA Annu Symp Proc - An automated approach for ranking journals to help in clinician decision support. ( 0,579887872048022 )
J Biomed Inform - SNOMED CT module-driven clinical archetype management. ( 0,572864185137283 )
Health Info Libr J - Medical literature searches: a comparison of PubMed and Google Scholar. ( 0,571845347647304 )
AMIA Annu Symp Proc - Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet. ( 0,570972385226977 )
J Biomed Inform - A mutation-centric approach to identifying pharmacogenomic relations in text. ( 0,56983909387052 )
J Biomed Inform - Determining the difficulty of Word Sense Disambiguation. ( 0,567925450714031 )
AMIA Annu Symp Proc - Leveraging user query sessions to improve searching of medical literature. ( 0,567021104918295 )
AMIA Annu Symp Proc - Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. ( 0,56618295334852 )
J Chem Inf Model - Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening. ( 0,565876515603176 )
J Biomed Inform - Supporting retrieval of diverse biomedical data using evidence-aware queries. ( 0,56542073722711 )
J Chem Inf Model - Identification of toxifying and detoxifying moieties for mutagenicity prediction by priority assessment. ( 0,563947794288458 )
Health Info Libr J - Assessment of indexing trends with specific and general terms for herbal medicine. ( 0,561187387139753 )
J Biomed Inform - MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms. ( 0,559337233840219 )
AMIA Annu Symp Proc - Evaluation of automated term groupings for detecting anaphylactic shock signals for drugs. ( 0,559062700464694 )
J. Med. Internet Res. - Web-based newborn screening system for metabolic diseases: machine learning versus clinicians. ( 0,557985221472131 )
Health Informatics J - Evaluation of ISO EN 13606 as a result of its implementation in XML. ( 0,556607385425216 )
AMIA Annu Symp Proc - Hyperdimensional computing approach to word sense disambiguation. ( 0,556194278144528 )
AMIA Annu Symp Proc - Search filter precision can be improved by NOTing out irrelevant content. ( 0,556119645105016 )
J Am Med Inform Assoc - A practical approach to achieve private medical record linkage in light of public resources. ( 0,555826713693422 )
J Biomed Inform - Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics. ( 0,55574395538442 )
BMC Med Inform Decis Mak - Towards case-based medical learning in radiological decision making using content-based image retrieval. ( 0,554770180770614 )
IEEE Trans Pattern Anal Mach Intell - On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. ( 0,552856280616598 )
J Integr Bioinform - A query suggestion workflow for life science IR-systems. ( 0,552833623965128 )
J Am Med Inform Assoc - A literature search tool for intelligent extraction of disease-associated genes. ( 0,552555777468301 )
J Am Med Inform Assoc - Automatically extracting sentences from Medline citations to support clinicians' information needs. ( 0,552412378634314 )
IEEE J Biomed Health Inform - On the seamless, harmonized use of ISO/IEEE11073 and openEHR. ( 0,552184316407283 )
J Biomed Inform - Interoperability of clinical decision-support systems and electronic health records using archetypes: a case study in clinical trial eligibility. ( 0,551327799474673 )
Int J Med Inform - FindZebra: a search engine for rare diseases. ( 0,550960453908214 )
J Biomed Inform - Using statistical text mining to supplement the development of an ontology. ( 0,55039532200245 )
Telemed J E Health - MEDLINE versus EMBASE and CINAHL for telemedicine searches. ( 0,549533095219332 )
BMC Med Inform Decis Mak - e-MIR2: a public online inventory of medical informatics resources. ( 0,548353804992807 )
J. Med. Internet Res. - Definition of Health 2.0 and Medicine 2.0: a systematic review. ( 0,547620066158654 )
J Am Med Inform Assoc - Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. ( 0,545119103295306 )
AMIA Annu Symp Proc - Locating relevant patient information in electronic health record data using representations of clinical concepts and database structures. ( 0,544500333277243 )
Methods Inf Med - Technology-induced errors. The current use of frameworks and models from the biomedical and life sciences literatures. ( 0,543920524252978 )
J Am Med Inform Assoc - Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure. ( 0,543851062144389 )
Methods Inf Med - Developing topic-specific search filters for PubMed with click-through data. ( 0,543284352271553 )
IEEE Trans Image Process - 3D object retrieval with multitopic model combining relevance feedback and LDA model. ( 0,541565450526475 )
AMIA Annu Symp Proc - Na?ve Electronic Health Record phenotype identification for Rheumatoid arthritis. ( 0,539229144722115 )
IEEE Trans Image Process - Coaching the exploration and exploitation in active learning for interactive video retrieval. ( 0,539187602268683 )
BMC Med Inform Decis Mak - Mapping turnaround times (TAT) to a generic timeline: a systematic review of TAT definitions in clinical domains. ( 0,536855460429828 )
AMIA Annu Symp Proc - Finding and accessing diagrams in biomedical publications. ( 0,536718609027485 )
J. Med. Internet Res. - Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews. ( 0,5362034190444 )
J Chem Inf Model - Chemical and biological properties of frequent screening hits. ( 0,536137175466663 )
Inform Health Soc Care - Readability of online health information: implications for health literacy. ( 0,535508616353135 )
J Am Med Inform Assoc - Search filters to identify geriatric medicine in Medline. ( 0,535419987539806 )
Health Info Libr J - Utilisation of search filters in systematic reviews of prognosis questions. ( 0,534920224298093 )
Methods Inf Med - Toward a formalization of the process to select IMIA Yearbook best papers. ( 0,534207224296495 )
J Am Med Inform Assoc - External phenome analysis enables a rational federated query strategy to detect changing rates of treatment-related complications associated with multiple myeloma. ( 0,533700454471318 )
J. Med. Internet Res. - Development and validation of filters for the retrieval of studies of clinical examination from Medline. ( 0,533058222756063 )
J Chem Inf Model - Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods. ( 0,53202730094162 )
IEEE J Biomed Health Inform - Histology image retrieval in optimised multi-feature spaces. ( 0,530710893011846 )
Health Info Libr J - Where and how to search for information on the effectiveness of public health interventions - a case study for prevention of cardiovascular disease. ( 0,529886670944572 )
J Am Med Inform Assoc - Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters. ( 0,529246562272901 )
Comput Math Methods Med - Biomarker identification using text mining. ( 0,529092542617173 )
J Chem Inf Model - Critical analysis of CCSD data quality. ( 0,526073049568606 )
J Am Med Inform Assoc - An integrative review of information systems and terminologies used in local health departments. ( 0,525389792441536 )
Int J Med Inform - A systematic review of predictive modeling for bronchiolitis. ( 0,525274480805286 )
J Integr Bioinform - Data integration using scanners with SQL output--the bioscanners project at sourceforge. ( 0,525157232704403 )
Methods Inf Med - Learning the preferences of physicians for the organization of result lists of medical evidence articles. ( 0,524649648345188 )
J Telemed Telecare - How to improve your PubMed/MEDLINE searches: 1. background and basic searching. ( 0,523986665375739 )
J Chem Inf Model - Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. ( 0,523924604764964 )
Int J Med Inform - An analysis of clinical queries in an electronic health record search utility. ( 0,522701527856886 )
BMC Med Inform Decis Mak - Combining classifiers for robust PICO element detection. ( 0,520880605511519 )
J Biomed Inform - Automatic generation of investigator bibliographies for institutional research networking systems. ( 0,518355772548444 )
J Biomed Inform - Semantic Space models for classification of consumer webpages on metadata attributes. ( 0,517800322303445 )
J Am Med Inform Assoc - PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. ( 0,516534914358859 )
J Am Med Inform Assoc - Completeness, accuracy, and computability of National Quality Forum-specified eMeasures. ( 0,516226054809396 )
Comput Math Methods Med - BioTCM-SE: a semantic search engine for the information retrieval of modern biology and traditional Chinese medicine. ( 0,515868426172143 )
J Biomed Inform - vSPARQL: a view definition language for the semantic web. ( 0,514902681833157 )