The use of Quantitative Structure-Activity Relationship models to address problems in drug discovery has a mixed history, generally resulting from the misapplication of QSAR models that were either poorly constructed or used outside of their domains of applicability. This situation has motivated the development of a variety of model performance metrics (r(2), PRESS r(2), F-tests, etc.) designed to increase user confidence in the validity of QSAR predictions. In a typical workflow scenario, QSAR models are created and validated on training sets of molecules using metrics such as Leave-One-Out or many-fold cross-validation methods that attempt to assess their internal consistency. However, few current validation methods are designed to directly address the stability of QSAR predictions in response to changes in the information content of the training set. Since the main purpose of QSAR is to quickly and accurately estimate a property of interest for an untested set of molecules, it makes sense to have a means at hand to correctly set user expectations of model performance. In fact, the numerical value of a molecular prediction is often less important to the end user than knowing the rank order of that set of molecules according to their predicted end point values. Consequently, a means for characterizing the stability of predicted rank order is an important component of predictive QSAR. Unfortunately, none of the many validation metrics currently available directly measure the stability of rank order prediction, making the development of an additional metric that can quantify model stability a high priority. To address this need, this work examines the stabilities of QSAR rank order models created from representative data sets, descriptor sets, and modeling methods that were then assessed using Kendall Tau as a rank order metric, upon which the Shannon entropy was evaluated as a means of quantifying rank-order stability. Random removal of data from the training set, also known as Data Truncation Analysis (DTA), was used as a means for systematically reducing the information content of each training set while examining both rank order performance and rank order stability in the face of training set data loss. The premise for DTA ROE model evaluation is that the response of a model to incremental loss of training information will be indicative of the quality and sufficiency of its training set, learning method, and descriptor types to cover a particular domain of applicability. This process is termed a "rank order entropy" evaluation or ROE. By analogy with information theory, an unstable rank order model displays a high level of implicit entropy, while a QSAR rank order model which remains nearly unchanged during training set reductions would show low entropy. In this work, the ROE metric was applied to 71 data sets of different sizes and was found to reveal more information about the behavior of the models than traditional metrics alone. Stable, or consistently performing models, did not necessarily predict rank order well. Models that performed well in rank order did not necessarily perform well in traditional metrics. In the end, it was shown that ROE metrics suggested that some QSAR models that are typically used should be discarded. ROE evaluation helps to discern which combinations of data set, descriptor set, and modeling methods lead to usable models in prioritization schemes and provides confidence in the use of a particular model within a specific domain of applicability.

use quantit structureact relationship model address problem drug discoveri mix histori general result misappl qsar model either poor construct use outsid domain applic situat motiv develop varieti model perform metric r press r ftest etc design increas user confid valid qsar predict typic workflow scenario qsar model creat valid train set molecul use metric leaveoneout manyfold crossvalid method attempt assess intern consist howev current valid method design direct address stabil qsar predict respons chang inform content train set sinc main purpos qsar quick accur estim properti interest untest set molecul make sens mean hand correct set user expect model perform fact numer valu molecular predict often less import end user know rank order set molecul accord predict end point valu consequ mean character stabil predict rank order import compon predict qsar unfortun none mani valid metric current avail direct measur stabil rank order predict make develop addit metric can quantifi model stabil high prioriti address need work examin stabil qsar rank order model creat repres data set descriptor set model method assess use kendal tau rank order metric upon shannon entropi evalu mean quantifi rankord stabil random remov data train set also known data truncat analysi dta use mean systemat reduc inform content train set examin rank order perform rank order stabil face train set data loss premis dta roe model evalu respons model increment loss train inform will indic qualiti suffici train set learn method descriptor type cover particular domain applic process term rank order entropi evalu roe analog inform theori unstabl rank order model display high level implicit entropi qsar rank order model remain near unchang train set reduct show low entropi work roe metric appli data set differ size found reveal inform behavior model tradit metric alon stabl consist perform model necessarili predict rank order well model perform well rank order necessarili perform well tradit metric end shown roe metric suggest qsar model typic use discard roe evalu help discern combin data set descriptor set model method lead usabl model priorit scheme provid confid use particular model within specif domain applic