J Chem Inf Model - Development of Ecom50 and retention index models for nontargeted metabolomics: identification of 1,3-dicyclohexylurea in human serum by HPLC/mass spectrometry.


The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom50 (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom50 models were developed based on both Molconn and CODESSA structural descriptors. These models gave r? values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v? = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom50 and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom50 and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.

Resumo Limpo

goal mani metabolom studi identifi molecular structur endogen molecul differenti express among sampl treatment group identifi compound can use gain understand diseas mechan unfortun despit recent advanc varieti analyt techniqu small molecul da identif remain difficult rare can chemic structur determin experiment featur retent time exact mass collis induc dissoci spectra thus without know structur biolog signific remain obscur studi explor identif method measur exact mass unknown use queri avail chemic databas compil list candid compound predict made candid use model experiment featur measur unknown predict valu use filter candid list elimin compound predict valu substanti differ unknown intent reduc list candid reason number can obtain measur confirm facilit explor measur data creat model two experiment featur ms ecom energi electronvolt requir fragment select precursor ion hplc retent index use data set compound ecom model develop base molconn codessa structur descriptor model gave r valu depend number input model algorithm chosen whether neutral proton structur use retent index model develop compound use backpropag artifici neural network molconn structur descriptor extern valid gave v standard error retent index unit test valid filter approach ecom retent index model along exact mass collis induc dissoci spectra match use identifi dicyclohexylurea human plasma compound previous known exist human biofluid element formula ident candid compound download pubchem result suggest use ecom retent index predict model can improv nontarget metabolit structur identif use hplcms deriv structur featur

