4D-QSAR: New Perspectives in Drug Design
Satyajit Dutta1*, Sagar Banik1, Sovan Sutradhar1, Sangya Dubey1 and Ira Sharma2
1IIMT College of Medical Sciences, ‘O’ Pocket, Mawana Road, Ganga Nagar, Meerut-250001, Uttar Pradesh, India
2KIET School of Pharmacy, Muradnagar, Ghaziabad-201206, Uttar Pradesh, India
*Corresponding Author E-mail: sanku6@gmail.com
ABSTRACT:
QSAR relationships are helpful in understanding and explaining the mechanism of drug action at the molecular level and allow the design and development of new compounds presenting desirable biological properties. 3D-QSAR formalisms, such as comparative molecular field analysis (CoMFA), use a set of compounds to generate 3D descriptors for building partial least squares (PLS) models, and provide relevant information for developing ligand-based drug design. The classical QSAR methods use as descriptors experimentally-derived molecular parameters and those calculated from the molecular connection table. The models obtained in the 4D-QSAR approach were also validated applying the y-randomization and LNO cross-validation in order to evaluate their reliability and robustness. Good QSAR models must have an average value of q2LNO, q2LNO, close to the q2LOO and standard deviation for each N should not exceed 0.1. It is recommended that N represents a significant fraction of samples (like leave-30%-out) in a satisfactory LNO test. A new formalism that takes advantage of GROMACS MD frames to build interaction energy models was presented in this study. The LQTA-QSAR formalism can be adapted to reach the user needs on building 4D-QSAR models, using a recent algorithm for variable selection, OPS, which has proved to be fast and capable of providing suitable variables for a PLS multivariate analysis. Thus, the best OPS-PLS models have demonstrated robustness and a good predictability for both investigated sets, using unbound ligands in a solvent medium.
KEYWORDS: Quantitative structure-activity relationship, Descriptors, Pharmacophore.
INTRODUCTION:
Quantitative structure-activity relationships (QSAR) play a vital role in modern drug design, since they represent a much cheaper and rapid alternative to the medium throughput in vitro and low throughput in vivo assays which are generally restricted to later in the discovery cascade. QSAR relationships are helpful in understanding and explaining the mechanism of drug action at the molecular level and allow the design and development of new compounds presenting desirable biological properties1. Advances in medicinal chemistry at the interface of chemistry and biology have created an important foundation in the search for new drug candidates possessing a combination of optimized pharmacodynamic and pharmacokinetic properties.
Drug discovery is currently driven by innovation and knowledge employing a combination of experimental and computational methods.
An understanding of the structure and function of the target, as well as the mechanism by which it interacts with potential drugs is crucial to this approach.
3D-QSAR formalisms, such as comparative molecular field analysis (CoMFA), use a set of compounds to generate 3D descriptors for building partial least squares (PLS) models, and provide relevant information for developing ligand-based drug design. Hopfinger and co-workers reported an independent-receptor (IR) methodology where multiple conformations of each ligand obtained from molecular dynamics (MD) simulations are considered in the construction of IR 3D-QSAR models. Aiming to combine the advantages of both methods, CoMFA and IR 4D-QSAR, an open source package of programs was developed, named LQTAgrid2. One would say that nowadays no drug is developed without previous QSAR analyses. Figure 1 shows a flowchart of the process from hit identification to lead optimization, highlighting the important role of QSAR in drug design.
The 4D-QSAR analyses, originally proposed by Hopfinger and co-workers in 1997. In this approach, the descriptors are the occupancy frequencies of the different atom types in the cubic grid cells during the molecular dynamics simulation (MDS) time, according to each trial alignment, corresponding to an ensemble averaging of conformational behavior. In a 4D-QSAR analysis each compound of the investigated set can be partitioned into classes (IPE), which are chosen regarding possible interactions with a common receptor. The QSAR methodology is based on the concept that the differences observed in the biological activity of a series of compounds can be quantitatively correlated with differences in their molecular structure. Therefore, biological activity of congeneric molecular structures are related to specific molecular features (descriptors) by using regression techniques to estimate the relative importance of those features contributing to the biological effect. The idea underlying a 4D-QSAR analysis is that variations in biological responses are related to differences in the Boltzmann average spatial distribution of molecular shape with respect to the IPE3. Assuming that more adequate QSAR models, taking into account a spatial structure of molecules and their conformational variety, should give more reliable results, generally used a novel 4D-QSAR approach, based on simplex representation of molecular structure.
The classical QSAR methods use as descriptors experimentally-derived molecular parameters (e.g., physicochemical data) and those calculated from the molecular connection table (2D structure). It is straightforward that experimental properties are a consequence of the entire three dimensional structures (3D). However, they cannot be measured for non-synthesized compounds. On the other hand, the 2D descriptors, which can be calculated for idealized compounds, do not capture all of the information in the 3D structure. Thus, when the study of the 3D molecular structure became practical routine with the parallel development of several computational molecular modeling techniques in the 1980s, the new era of the drug design process, named Computer-Aided/Assisted Drug Design (CADD) or Computer-Aided/Assisted Molecular Design (CAMD) came into being and QSAR methodology has became in a broad subfield of CADD4. Since then, several QSAR methodologies have been proposed. Each of them can be characterized by having particular approaches for calculating and selecting the molecular descriptors, and specific statistical algorithms for constructing the resulting models.
In the 4D QSAR methodology a conformational ensemble profile of each compound is used to generate the independent variables (GCODs) instead of just one starting conformation. The variable selection is made using a genetic algorithm (GFA). A new 4D-QSAR approach introduced in the present work and named LQTA-QSAR (LQTA, Laborato´rio de Quimiometria Teo´rica e Aplicada), is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation of 3D descriptors for a set of compounds5.
MATERIALS AND METHODS USED FOR ANALYSIS:
The various investigation made by G. L. Kamalov, R. N. Lozytska, D. N. Kryzhanovsky from the A. V. Bogatsky Physico-Chemical Institute of NAS of Ukraine and I. V. Alekseeva, L. I. Palchikovskaya from the Institute of Molecular Biology and Genetics NAS of Ukraine. They investigated anticancer activity of various compounds and tested in vitro at the National Cancer Institute (Bethesda, U.S.A.). Their panel includes 60 cell lines of the following 9 human malignant tumors: leukemia, CNS cancer, prostate cancer, breast cancer, melanoma, non-small cell lung cancer, colon cancer, ovarian cancer, renal cancer.
They tested the compounds at 5 concentrations, resulting from a series of 10-fold dilutions (10–8–10–4 M). Dimethylsulphoxide was used for preliminary dissolving of the compounds. A 48 h continuous drug exposure protocol and a sulforhodamine B (SRB) protein assay are used to estimate cell growth or cell death.
Figure 1. Schematic representation of the processes included in a lead optimization from the hit identification. QSAR methods are essential to reach this goal.
Prior to 4D-QSAR analysis studies are carried out employing the GROMACS software. LQTA-QSAR makes use of the GROMACS free package to run the molecular dynamics. In the LQTAgrid program, (LQTAgrid module generates the 3D–interaction energy descriptor) the user can define the initial coordinates and the size of the 3D virtual lattice with defined grid, considering the coordinates from the “gro” files. It is recommended that one use a grid size sufficient to contain all conformers of the investigated set. A grid spacing of 1 Å is selected to generate several thousand points at the intersections of a regular 3D lattice6.
Variable Selection and Model Validation:
Descriptor matrices generated by the LQTAgrid module (21,120 variables for data set 1 and 59,584 for data set 2) were previously auto scaled to perform the variable selection and model building procedures. The absolute values of the correlation coefficients between each descriptor and te biological activity were calculated, and those with coefficients lower than 0.2 were eliminated from the analysis. At this point, 2449 independent variables remained for set 1 and 19,924 for set 2. In addition, descriptors whose plots versus the dependent variable showed non-uniform distribution or dispersion were also eliminated. The initial sets of descriptors used to carry out the variable selection using the ordered predictors selection (OPS) algorithm were 1570 descriptors (set 1) and 8265 (set 2), respectively. The basic idea of this algorithm is to attribute an importance to each descriptor based on an informative vector7. The step involved in the complete LATA- QSAR analysis is shown in the Scheme 1.
Regression models were validated applying the leave-N-out (LNO) cross-validation and y-randomization. In the LNO cross-validation procedure, N compounds (N) 1, 2... 10... were left out from the training set. For a particular N, the data were randomized 20 times, and the average and standard deviation values for q2 were used. In the y-randomization, the dependent variable-vector was randomly shuffled 50 times for the two investigated sets.
4D-QSAR :
In the present work a new 4D-QSAR approach has been considered. For all investigated molecules the 3D structural models have been created and sets of conformers (fourth dimension) have been used. Each conformer is represented as a system of different simplexes (tetratomic fragments of fixed structure, chirality and symmetry). An example is presented on Scheme 2.
Scheme 2
The simplex representation of molecular structure for biologically active substances allows unifying the description of spatial structure of compounds with saving of the complete stereochemical information. It enables to determine easily common fragments of spatial structure both promoting given biological activity and interfering with it. Thus, molecular structure descriptors in such a model are simplexes (fixed topology and stereochemical configuration) and their various combinations. The molecular design of compounds with the given level of activity is possible by means of generating of the allowed combinations of such types simplexes, which determine investigated property. As an evolution of Molecular Shape Analysis (MSA), Hopfinger and co-workers proposed the 4D-QSAR formalism, which includes the conformational flexibility and the freedom of alignment by ensemble averaging in the conventional three dimensional descriptors found in traditional 3D-QSAR methods. Thus, the “fourth dimension” of the method is ensemble sampling the spatial features of the members of a training set8. The grid cell occupancy descriptors (GCODs), are generated for a number of different atom types, called interaction pharmacophore elements, IPEs. These IPEs (i.e., atom types), defined as “any type” (A or Any), “nonpolar” (NP), “polar-positive charge” (P+), “polar-negative charge” (P-), “hydrogen bond acceptor” (HA), “hydrogen bond donor” (HB), and “aromatic” (Ar), correspond to the interactions that may occur in the active site, and are related to the pharmacophore groups. Thus, the IPEs are related to the descriptors’ nature in 4D-QSAR analysis, while the GCODs are related to the coordinates of IPE mapped in a common grid. The sampling process, in turn, allows the construction of optimized dynamic spatial QSAR models in the form of 3D pharmacophores, which are dependent on conformation, alignment, and pharmacophore grouping.
The use of IPEs allows each of the compounds in a training set to be partitioned into sets of structure types and/or classes with respect to possible interactions with a common receptor. Sets of GCODs, defined by the IPEs, are simultaneously mapped into a common grid cell space. In the 4D-QSAR methodology a conformational ensemble profile of each compound is used to generate the independent variables (GCODs) instead of just one starting conformation. The variable selection is made using a genetic algorithm (GFA)9. One factor driving the development of 4D-QSAR analysis is the need to take into account multiple a) conformations, b) alignments, and c) substructure groups in constructing QSAR models. These “QSAR degrees of freedom” are normally held fixed in other 3D-QSAR analysis. In the CoMFA (Comparative Molecular Fields Analysis) and GRID formalisms the descriptors are calculated as grid point interactions between a probe atom and the target molecules and only one conformation of each compound is considered, not a conformational ensemble profile (as in 4D-QSAR method). They use different force fields, different types of probe atoms and the energy interactions are calculated differently. Interactions accounted for in the GRID force fields are steric (Lennard-Jones), electrostatic and hydrogen bonding interactions, and the total energy is the sum of all interactions. In contrast to CoMFA where the interaction energies (Lennard-Jones and electrostatic potentials) are considered separately, the sum of all the different interaction energies is calculated in each grid point with GRID. The variable selection is made by the GOLPE (generating optimal linear PLS estimations) program which is used also to perform the multivariate statistical analysis.
The CoMSIA (Comparative Molecular Similarity Indices Analysis) approach uses similarity measures between a probe atom (placed at each lattice position) and the molecules rather than CoMFA fields. Steric, electrostatic, and hydrophobic similarities are calculated using the SEAL program to molecular superposition (similarity index). Insofar as 4D-QSAR analysis can meaningfully predict “active” conformations and the preferred alignment for a training set, it may actually serve as a “preprocessor” for a subsequent CoMFA and/or CoMSIA10. Furthermore, the 4D-QSAR method has been proven both useful and reliable for the construction of quantitative 3D pharmacophore models for ligand-receptor data sets.
Successful Applications of 4D-QSAR
The benefits of doing the RI-4D-QSAR analysis performed as part of this study include11;
a) Providing a reliable and predictive 3D-pharmacophore model for the limited range of substituent sites and substituent chemistry;
b) Developing a rational basis of where substituent can, and cannot, be placed on the scaffold structures of the analogs;
c) The use of the 3D-pharmacophore model as a docking alignment for general ligand-receptor modeling including future RD-4D-QSAR studies; and
d) Employing the 4D-QSAR model as an initial virtual screening for future studies that can be structure-based.
In addition, the 4D-fingerprint formulation of the 4D-QSAR paradigm permits alternate model generation, particularly useful in virtual screening. Still, the 4D-fingerprint models are once again directly comparable to the RI-4D-QSAR models so as to exact additional information from the data set, as well as to evaluate the self-consistency across all the models constructed12. One example of a successful application of the RD-4D-QSAR approach, was to a set of 4- hydroxy-5,6-dihydropyrone inhibitors of HIV-1 protease. The receptor model used in this QSAR analysis was derived from the HIV-1 protease crystal structure13. The bound ligand in the active site of the enzyme, also a 4-hydroxy-5,6-dihydropyrone analogue, was used as the reference ligand for docking the data set compounds. Although the 4D-QSAR method has traditionally been used to develop models with internal and external consistency, as well as predictive power, the same approach can be applied to estimate the activities of compound libraries, i.e. virtual screening (VS), for the identification of new hits14. The increasing demand for the analysis of large data sets such as those generated by combinatorial chemistry and high-throughput screening (HTS) techniques has demonstrated once again the versatility and range of applications of 4D-QSAR.
4D-Formalism: ‘Practical’ Application in Drug Design:
The discovery of novel drug targets has increased exponentially in recent years due to advances in genomic and molecular biology techniques. Experimental and computational methods are effectively applied to accelerate the process of lead identification and optimization. HTS identifies lead molecules by performing individual biochemical assays with over millions of compounds, but it is huge cost and time consuming. These disadvantages have been overcome by the integration of cheaper and effective computational methodology as VHTS15. Pharmacophore fingerprints are often used in the design and evaluation of compound libraries. A pharmacophore is generally a pattern of chemical groups in space that defines how ligands bind to a common receptor and also responsible for the biological response from the ligands. Additionally, the availability of the ligand at the site of action is related to its transport and metabolic behavior in the body environmental. Then, QSAR methods try to capture, or to extract, information about both the pharmacophore and availability components from a training set of compounds. The extent of pharmacophore and availability information that can be built into a QSAR model depends not only upon the training set, but also upon the descriptors used to represent them16. The 4D-fingerprints are descriptors derived from the 4D-MS methodology, which permits the generation of sets of molecular fingerprints that retain the conformational information of a compound as well as capture its size and chemical structure. Therefore, each molecular “finger” of the molecular fingerprint is specific to a particular atom/pharmacophore type present in a compound17.
A unique set of molecular fingerprints can be constructed for each specific alignment assigned to the compounds of a training set or library. So, alignment dependent molecular fingerprints permit molecular similarity measures to be developed as a function of the binding mode to a receptor site18.
DISCUSSION:
In the present work a new 4D-QSAR approach has been considered. For all investigated molecules the 3D structural models have been created and sets of conformers (fourth dimension) have been used. Each conformer is represented as a system of different simplexes (tetratomic fragments of fixed structure, chirality and symmetry). The developed approach has not the limitations of CoMFA (comparative molecular field analysis). In the CoMFA only single fixed conformer is considered for each compound and a choice of such a conformer is often accidental19. Moreover, in CoMFA there is a problem of optimal alignment of a set of considered molecules.
The models obtained in the 4D-QSAR approach were also validated applying the y-randomization and LNO cross-validation in order to evaluate their reliability and robustness. Good QSAR models must have an average value of q2LNO, q2LNO, close to the q2LOO and standard deviation for each N should not exceed 0.1. It is recommended that N represents a significant fraction of samples (like leave-30%-out) in a satisfactory LNO test20.
Unfortunately, the literature models were not thoroughly validated. However, nowadays such procedures are highly recommended, particularly in the case of some literature model for which the difference between q2 and r2 (0.36) is higher than 0.2, suggesting that the model was overfitted.
Descriptors Interpretation:
The descriptors selected by OPS are visualized in Figures 2 and 3 as solvent accessibility surfaces. Light blue regions denote steric interactions corresponding to positive PLS regression coefficients, while pink regions represent steric regions related to negative regression coefficients. Likewise, dark blue color and red regions denote electrostatic descriptors with positive and negative regression coefficients, respectively21. A conformation of the most active compound for each investigated set and its relation to the binding site interactions are shown22.
Figure 2. Visualization of the LQTAgrid descriptors found for the most active molecule of set 1 (ViewerLite 5.0, Accelrys, Inc., 2002).
Figure 3. Visualization of the LQTAgrid descriptors found for the most active molecule of set 2 (ViewerLite 5.0, Accelrys, Inc., 2002).
CONCLUSIONS:
Computational methods play a crucial role in modern medicinal chemistry, presenting a unique potential for transforming the early phases of drug research, particularly in terms of time and cost savings. Most of the techniques used in structure-based drug design have experienced significant improvements in the past few years, resulting in a remarkable enhancement of the speed and the efficacy of this approach. A new formalism that takes advantage of GROMACS MD frames to build interaction energy models was presented in this study. The LQTA-QSAR formalism can be adapted to reach the user needs on building 4D-QSAR models, using a recent algorithm for variable selection, OPS, which has proved to be fast and capable of providing suitable variables for a PLS multivariate analysis. However, the LQTA-QSAR models were thoroughly validated applying the LNO internal cross-validation and y-randomization methods. Thus, the best OPS-PLS models have demonstrated robustness and a good predictability for both investigated sets, using unbound ligands in a solvent medium.
4D-QSAR analysis can also be applied to non-medicinal chemistry and biological problems. One such example in materials science is to predict how chelators will bind metal ions both in solution and on surfaces. The practical applications are to design chelators that selectively remove specific ions from solutions and surfaces. Real world examples are keeping the walls of the tanks of hot water heater clean, swimming pool liners clean and making 'hard' water 'softer' by removing divalent ions like Ca++.
REFERENCES:
1. Bleicher KH, Böhm HJ, Müller K and Alanine AI. Hit and lead generation: beyond highthroughput screening. Nat. Rev. Drug Discov. 2; 2003: 369–378.
2. Lozitsky VP, Puzis LE and Polyak RYa. Resistance of mice to reinfection after E-aminocaproic acid treatment of primary influenza virus infection. Acta Virol. 32; 1988: 117–122.
3. Zhao H. Scaffold selection and scaffold hopping in lead generation: A medicinal chemistry perspective. Drug Discov. Today. 12; 2007: 149–155.
4. Fedtchouk AS, Veveritsa PG, Lozitsky, VP and Girlya YuI. Medical cure of recidiving herpes simplex virus infections by means of proteolysis inhibitors. Antiviral Res. 41; 1999: A67.
5. Salum LB and Andricopulo AD. Fragment-based QSAR: Perspectives in drug design. Mol. Divers. 13; 2009: 277–285.
6. Guido RVC, Oliva G and Andricopulo AD. Virtual screening and its integration with modern drug design technologies. Curr. Med. Chem. 15; 2008: 37–46.
7. Ooms F. Molecular modeling and computer aided drug design-Examples of their applications in medicinal chemistry. Curr. Med. Chem. 7; 2000: 141–158.
8. Hopfinger AA. QSAR investigation of dihydrofolate-reductase inhibition by baker triazines based upon molecular shape-analysis. J. Am. Chem. Soc. 102, 1980, 7196–7206.
9. Hopfinger AJ. Inhibition of dihydrofolate reductase: structure-activity correlations of 2,4-diamino-5-benzylpyrimidines based upon molecular shape analysis. J. Med. Chem. 24; 1981: 818–22.
10. Hopfinger A, Wang S, Tokarski J, Jin B, Albuquerque M, Madhav P and Duraiswami C. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 119; 1997: 10509–10524.
11. Albuquerque M, Brito M, Cunha E, Alencastro R, Antunes O, Castro H and Rodrigues C. Multidimensional-QSAR: Beyond the third-dimension in drug design. Curr. Methods Med. Chem. Biol. Phys. 1; 2007: 91–100.
12. Albuquerque MG, Hopfinger AJ, Barreiro EJ and de Alencastro RB. Four-dimensional quantitative structure-activity relationship analysis of a series of interphenylene 7-oxabicycloheptane oxazole thromboxane A2 receptor antagonists. J. Chem. Inf. Comput. Sci. 38, 1998, 925–938.
13. Rogers DG and Hopfinger AJ. Applications of genetic function approximation to quantitativestructure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 34, 1994, 854–866.
14. Cramer III RD, Patterson DE and Bunce JD. Comparative Molecular Field Analyses (CoMFA). Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 110; 1988: 5959–5967.
15. Parrinello M and Rahman A. Crystal structure and pair potentials: A molecular dynamics study. Phys. ReV. Lett. 45 (14); 1980: 1196.
16. Berendsen HJC, Postma JPM, Gunsteren WFv, DiNola A and Haak JR. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81 (8); 1984: 3684–3690.
17. Bratchell N. Chemometric methods in molecular design. J. Chemom. 11 (1); 1997: 93–94.
18. Tropsha A, Gramatica P and Gombar, VK. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 22 (1); 2003: 69–77.
19. Golbraikh A and Tropsha A. Beware of q2! J. Mol. Graphics Modell. 20 (4); 2002: 269–276.
20. Ortiz AR, Pastor M, Palomer A, Cruciani G, Gago F and Wade RC. Reliability of Comparative Molecular Field Analysis Models: Effects of Data Scaling and Variable Selection Using a Set of Human Synovial Fluid Phospholipase A2 Inhibitors. J. Med. Chem. 40; 1997: 1136–1148.
21. Joa˜o Paulo A. Martins, Euze´bio G. Barbosa, Kerly FM Pasqualoto and Ma´rcia M. C. Ferreira. LQTA-QSAR: A New 4D-QSAR Methodology. J. Chem. Inf. Model. 49; 2009:1428–1436.
22. Carolina H Andrade, Kerly FM Pasqualoto, Elizabeth I Ferreira and Anton J. Hopfinger. 4D-QSAR: Perspectives in Drug Design. Molecules. 15; 2010: 3281-3294. doi:10.3390/molecules15053281
Received on 04.02.2011 Modified on 20.03.2011
Accepted on 04.04.2011 © AJRC All right reserved
Asian J. Research Chem. 4(6): June, 2011; Page 857-862