AMIA Annu Symp Proc - Determining word sequence variation patterns in clinical documents using multiple sequence alignment.

Tópicos

{ sequenc(1873) structur(1644) protein(1328) }
{ state(1844) use(1261) util(961) }
{ data(1737) use(1416) pattern(1282) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(984) reconstruct(947) comput(926) }
{ import(1318) role(1303) understand(862) }
{ research(1218) medic(880) student(794) }
{ sampl(1606) size(1419) use(1276) }
{ high(1669) rate(1365) level(1280) }
{ bind(1733) structur(1185) ligand(1036) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2675) segment(2577) method(1081) }
{ concept(1167) ontolog(924) domain(897) }
{ featur(1941) imag(1645) propos(1176) }
{ group(2977) signific(1463) compar(1072) }
{ structur(1116) can(940) graph(676) }
{ activ(1452) weight(1219) physic(1104) }
{ can(774) often(719) complex(702) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ extract(1171) text(1153) clinic(932) }
{ general(901) number(790) one(736) }
{ studi(1410) differ(1259) use(1210) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ spatial(1525) area(1432) region(1030) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ process(1125) use(805) approach(778) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Sentences and phrases that represent a certain meaning often exhibit patterns of variation where they differ from a basic structural form by one or two words. We present an algorithm that utilizes multiple sequence alignments (MSAs) to generate a representation of groups of phrases that possess the same semantic meaning but also share in common the same basic word sequence structure. The MSA enables the determination not only of the words that compose the basic word sequence, but also of the locations within the structure that exhibit variation. The algorithm can be utilized to generate patterns of text sequences that can be used as the basis for a pattern-based classifier, as a starting point to bootstrap the pattern building process for a regular expression-based classifiers, or serve to reveal the variation characteristics of sentences and phrases within a particular domain.

Resumo Limpo

sentenc phrase repres certain mean often exhibit pattern variat differ basic structur form one two word present algorithm util multipl sequenc align msas generat represent group phrase possess semant mean also share common basic word sequenc structur msa enabl determin word compos basic word sequenc also locat within structur exhibit variat algorithm can util generat pattern text sequenc can use basi patternbas classifi start point bootstrap pattern build process regular expressionbas classifi serv reveal variat characterist sentenc phrase within particular domain

Resumos Similares

Comput Biol Chem - Genome-wide analysis and evolutionary study of sucrose non-fermenting 1-related protein kinase 2 (SnRK2) gene family members in Arabidopsis and Oryza. ( 0,701570119102681 )
Comput. Biol. Med. - An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. ( 0,693976358844869 )
Comput Biol Chem - The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. ( 0,668596308662808 )
Curr Protoc Bioinformatics - Clustal omega. ( 0,649775959733213 )
Sci Data - Comprehensive analysis of the venom gland transcriptome of the spider Dolomedes fimbriatus. ( 0,634001185932984 )
Comput Math Methods Med - DV-curve representation of protein sequences and its application. ( 0,63210543207215 )
Brief. Bioinformatics - Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. ( 0,616646264959861 )
Comput Biol Chem - ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. ( 0,607029963830177 )
Comput Math Methods Med - Quad-PRE: a hybrid method to predict protein quaternary structure attributes. ( 0,597869907352051 )
Brief. Bioinformatics - Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. ( 0,597584338515841 )
J. Comput. Biol. - Combinatorics of -structures. ( 0,597402383599932 )
J. Comput. Biol. - Statistical significance of optical map alignments. ( 0,5951852476303 )
Comput Biol Chem - Human-chimpanzee alignment: ortholog exponentials and paralog power laws. ( 0,595146604335574 )
Comput Biol Chem - Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction. ( 0,592821443847545 )
J. Comput. Biol. - Nonparametric combinatorial sequence models. ( 0,592028271589235 )
Comput. Biol. Med. - Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. ( 0,591185576191722 )
Comput Biol Chem - ProCoCoA: A quantitative approach for analyzing protein core composition. ( 0,589957569083932 )
J Chem Inf Model - Structural determinants for the membrane insertion of the transmembrane peptide of hemagglutinin from influenza virus. ( 0,588621320340134 )
J. Comput. Biol. - IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. ( 0,584210110657159 )
Med Biol Eng Comput - Ataxin active site determination using spectral distribution of electron ion interaction potentials of amino acids. ( 0,584021387766536 )
Comput. Biol. Med. - Improving protein complex classification accuracy using amino acid composition profile. ( 0,583961315969809 )
J Integr Bioinform - Nutrilyzer: a tool for deciphering atomic stoichiometry of differentially expressed paralogous proteins. ( 0,583041571506257 )
Comput Biol Chem - A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. ( 0,58197401220471 )
J Chem Inf Model - ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures. ( 0,581791768782156 )
Comput Biol Chem - Analysis of sequence repeats of proteins in the PDB. ( 0,581316409004884 )
Comput Biol Chem - Computational determination of the orientation of a heat repeat-like domain of DNA-PKcs. ( 0,581145262517972 )
Brief. Bioinformatics - De novo assembly of short sequence reads. ( 0,578529478477043 )
Comput Biol Chem - Identification of putative and potential cross-reactive chickpea (Cicer arietinum) allergens through an in silico approach. ( 0,577899986768482 )
J Integr Bioinform - A hierarchical approach to protein fold prediction. ( 0,577124121724191 )
J Chem Inf Model - Improved helix and kink characterization in membrane proteins allows evaluation of kink sequence predictors. ( 0,576884374650974 )
Comput Biol Chem - How do the protonation states of E296 and D312 in OmpF and D299 and D315 in homologous OmpC affect protein structure and dynamics? Simulation studies. ( 0,575526194273683 )
Brief. Bioinformatics - Alpha shape and Delaunay triangulation in studies of protein-related interactions. ( 0,575491021063211 )
Comput Biol Chem - Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. ( 0,575290598435833 )
Curr Protoc Bioinformatics - Using the structure-function linkage database to characterize functional domains in enzymes. ( 0,575106374877351 )
J. Comput. Biol. - Fast matching of transcription factor motifs using generalized position weight matrix models. ( 0,571771337188544 )
Comput. Biol. Med. - Gene comparison based on the repetition of single-nucleotide structure patterns. ( 0,571350428986679 )
Comput. Biol. Med. - New layers in understanding and predicting a-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. ( 0,57097942841418 )
Comput Biol Chem - Computational insight into nitration of human myoglobin. ( 0,56975229292986 )
Comput Biol Chem - A local average connectivity-based method for identifying essential proteins from the network level. ( 0,568672123308162 )
Comput Biol Chem - Bacterial protein structures reveal phylum dependent divergence. ( 0,568672123308162 )
Comput Math Methods Med - Fundamental dynamical modes underlying human brain synchronization. ( 0,568522886674321 )
Med Decis Making - Comparison of general population, patient, and carer utility values for dementia health states. ( 0,567497745745976 )
Comput. Biol. Med. - A content and structural assessment of oxidative motifs across a diverse set of life forms. ( 0,56695955206478 )
J Integr Bioinform - Exceptional single strand DNA word symmetry: analysis of evolutionary potentialities. ( 0,56489981339018 )
Comput Biol Chem - The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. ( 0,564605455143998 )
J Biomed Inform - Evolution of the Sequence Ontology terms and relationships. ( 0,563650638911521 )
J. Comput. Biol. - A novel technique for detecting putative horizontal gene transfer in the sequence space. ( 0,559051964628923 )
Brief. Bioinformatics - Systematic identification of Class I HDAC substrates. ( 0,558990803207875 )
J. Comput. Biol. - Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. ( 0,558907446149168 )
J. Comput. Biol. - LB3D: a protein three-dimensional substructure search program based on the lower bound of a root mean square deviation value. ( 0,558830424584231 )
Comput Biol Chem - Exploring and characterizing the folding processes of Lys- and Arg-containing Ala-based peptides: a molecular dynamics study. ( 0,558729655885455 )
J. Comput. Biol. - Statistical significance of threading scores. ( 0,558458362033148 )
Med Decis Making - It's all in the name, or is it? The impact of labeling on health state values. ( 0,558457607144135 )
Brief. Bioinformatics - Identify drug repurposing candidates by mining the protein data bank. ( 0,557921824024793 )
Sci Data - Long-read, whole-genome shotgun sequence data for five model organisms. ( 0,557781852563692 )
J. Comput. Biol. - Emergent protein folding modeled with evolved neural cellular automata using the 3D HP model. ( 0,556284911528296 )
Brief. Bioinformatics - New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. ( 0,55533346657784 )
BMC Med Inform Decis Mak - Efficient protein structure search using indexing methods. ( 0,55530570380218 )
J Chem Inf Model - Comparative analysis of threshold and tessellation methods for determining protein contacts. ( 0,554937579455256 )
J Chem Inf Model - PocketAlign a novel algorithm for aligning binding sites in protein structures. ( 0,554844927443127 )
Med Biol Eng Comput - Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines. ( 0,554657870918127 )
J. Comput. Biol. - Evaluating, comparing, and interpreting protein domain hierarchies. ( 0,553825094140644 )
J Chem Inf Model - Parallel and antiparallel ?-strands differ in amino acid composition and availability of short constituent sequences. ( 0,552922745727399 )
J Chem Inf Model - Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method. ( 0,552603462938054 )
Sci Data - Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. ( 0,552458484551394 )
J. Comput. Biol. - ComB: SNP calling and mapping analysis for color and nucleotide space platforms. ( 0,552433083899973 )
J. Comput. Biol. - Efficient traversal of beta-sheet protein folding pathways using ensemble models. ( 0,552303271004011 )
Curr Protoc Bioinformatics - Using the RNAstructure Software Package to Predict Conserved RNA Structures. ( 0,549956545249526 )
Comput Biol Chem - The complex task of choosing a de novo assembly: lessons from fungal genomes. ( 0,549675288630027 )
Comput Biol Chem - Identification and characterization of lysine-methylated sites on histones and non-histone proteins. ( 0,548880551891516 )
J. Comput. Biol. - Simultaneous alignment and folding of protein sequences. ( 0,548236590202387 )
J. Comput. Biol. - Statistical significance of normalized global alignment. ( 0,546089414292593 )
Comput. Biol. Med. - Intron identification approaches based on weighted features and fuzzy decision trees. ( 0,544553702289769 )
J. Comput. Biol. - AREM: aligning short reads from ChIP-sequencing by expectation maximization. ( 0,543721549261315 )
Comput Biol Chem - Statistical analysis and exposure status classification of transmembrane beta barrel residues. ( 0,543535207828341 )
J Chem Inf Model - Modules identification in protein structures: the topological and geometrical solutions. ( 0,54345719869566 )
J Chem Inf Model - Protein structural statistics with PSS. ( 0,541852267940916 )
Comput. Biol. Med. - The possible role of HSPs on Beh?et's disease: a bioinformatic approach. ( 0,541519774280511 )
J Chem Inf Model - Protein secondary structure classification revisited: processing DSSP information with PSSC. ( 0,541392994237841 )
Med Biol Eng Comput - Characterization and prediction of mRNA polyadenylation sites in human genes. ( 0,541275415744748 )
J. Comput. Biol. - Sequence alignment of viral channel proteins with cellular ion channels. ( 0,540812863752582 )
J. Comput. Biol. - Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. ( 0,539826305091294 )
Comput. Biol. Med. - miRClassify: an advanced web server for miRNA family classification and annotation. ( 0,538892029055891 )
Comput Biol Chem - Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. ( 0,538481490732594 )
Curr Protoc Bioinformatics - Using the MEROPS Database for Proteolytic Enzymes and Their Inhibitors and Substrates. ( 0,53753992273507 )
Comput. Biol. Med. - Cutaneous amyloidoses: a minimum common denominator in their amino acid sequence. ( 0,536618212884255 )
Comput Methods Programs Biomed - Can computational biology improve the phylogenetic analysis of insulin? ( 0,535514867643113 )
Comput Biol Chem - Multi-nucleation and vectorial folding pathways of large helix protein. ( 0,535108703024602 )
IEEE Trans Pattern Anal Mach Intell - Temporal Localization of Actions with Actoms. ( 0,534944541111453 )
Comput. Biol. Med. - Improving protein secondary structure prediction using a multi-modal BP method. ( 0,534299510623438 )
Brief. Bioinformatics - Automated glycopeptide analysis--review of current state and future directions. ( 0,533714193540068 )
J Chem Inf Model - Dihedral-based segment identification and classification of biopolymers II: polynucleotides. ( 0,53329798195583 )
J Biomed Inform - A similarity network approach for the analysis and comparison of protein sequence/structure sets. ( 0,533056444664873 )
Comput Biol Chem - Protein fold recognition based on functional domain composition. ( 0,532510554791936 )
J. Comput. Biol. - Smoothing 3D protein structure motifs through graph mining and amino acid similarities. ( 0,53245710182619 )
J Chem Inf Model - Discovery of novel promising targets for anti-AIDS drug developments by computer modeling: application to the HIV-1 gp120 V3 loop. ( 0,532004892622665 )
Comput Biol Chem - Identical sequence patterns in the ends of exons and introns of human protein-coding genes. ( 0,531632564503886 )
Comput Math Methods Med - ADLD: a novel graphical representation of protein sequences and its application. ( 0,530510615780591 )
J Chem Inf Model - Protein secondary structure prediction with SPARROW. ( 0,530118757436257 )
Comput Biol Chem - Molecular dynamics simulations of lectin domain of FimH and immunoinformatics for the design of potential vaccine candidates. ( 0,52925347985307 )