The ABC transporter P-glycoprotein (P-gp) actively transports a wide range of drugs and toxins out of cells, and is therefore related to multidrug resistance and the ADME profile of therapeutics. Thus, development of predictive in silico models for the identification of P-gp inhibitors is of great interest in the field of drug discovery and development. So far in silico P-gp inhibitor prediction was dominated by ligand-based approaches because of the lack of high-quality structural information about P-gp. The present study aims at comparing the P-gp inhibitor/noninhibitor classification performance obtained by docking into a homology model of P-gp, to supervised machine learning methods, such as Kappa nearest neighbor, support vector machine (SVM), random fores,t and binary QSAR, by using a large, structurally diverse data set. In addition, the applicability domain of the models was assessed using an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and noninhibitors, correctly predicting 73/75% of the external test set compounds. Classification based on the docking experiments using the scoring function ChemScore resulted in the correct prediction of 61% of the external test set. This demonstrates that ligand-based models currently remain the methods of choice for accurately predicting P-gp inhibitors. However, structure-based classification offers information about possible drug/protein interactions, which helps in understanding the molecular basis of ligand-transporter interaction and could therefore also support lead optimization.

abc transport pglycoprotein pgp activ transport wide rang drug toxin cell therefor relat multidrug resist adm profil therapeut thus develop predict silico model identif pgp inhibitor great interest field drug discoveri develop far silico pgp inhibitor predict domin ligandbas approach lack highqual structur inform pgp present studi aim compar pgp inhibitornoninhibitor classif perform obtain dock homolog model pgp supervis machin learn method kappa nearest neighbor support vector machin svm random forest binari qsar use larg structur divers data set addit applic domain model assess use algorithm base euclidean distanc result show random forest svm perform best classif pgp inhibitor noninhibitor correct predict extern test set compound classif base dock experi use score function chemscor result correct predict extern test set demonstr ligandbas model current remain method choic accur predict pgp inhibitor howev structurebas classif offer inform possibl drugprotein interact help understand molecular basi ligandtransport interact therefor also support lead optim