J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions.


The inclusion and accessibility of different methodologies to explore chemical data sets has been beneficial to the field of predictive modeling, specifically in the chemical sciences in the field of Quantitative Structure-Activity Relationship (QSAR) modeling. This study discusses using contemporary protocols and QSAR modeling methods to properly model two biomolecular systems that have historically not performed well using traditional and three-dimensional QSAR methodologies. Herein, we explore, analyze, and discuss the creation of a classification human Ether-a-go-go Related Gene (hERG) potassium channel model and a continuous Tetrahymena pyriformis (T. pyriformis) model using Support Vector Machine (SVM) and Support Vector Regression (SVR), respectively. The models are constructed with three types of molecular descriptors that capture the gross physicochemical features of the compounds: (i) 2D, 2 1/2D, and 3D physical features, (ii) VolSurf-like molecular interaction fields, and (iii) 4D-Fingerprints. The best hERG SVM model achieved 89% accuracy and the three-best SVM models were able to screen a Pubchem data set with an accuracy of 97%. The best T. pyriformis model had an R(2) value of 0.924 for the training set and was able to predict the continuous end points for two test sets with R(2) values of 0.832 and 0.620, respectively. The studies presented within demonstrate the predictive ability (classification and continuous end points) of QSAR models constructed from curated data sets, biologically relevant molecular descriptors, and Support Vector Machines and Support Vector Regression. The ability of these protocols and methodologies to accommodate large data sets (several thousands compounds) that are chemically diverse - and in the case of classification modeling unbalanced (one experimental outcome dominates the data set) - allows scientists to further explore a remarkable amount of biological and chemical information.

