INTRODUCTION:

Planar benzonoid hydrocarbons (PAHs) are organic molecules due to incomplete combustion of carbonaceous materials¹ following minor natural processes^2-4 and major anthropic processes⁵. The HAPs are made up of carbon and hydrogen atoms forming at least two condensed aromatic rings^6,7. They are released in all the compartments of the environment^8,9. The number of PAHs identified with this Day is of the order of 130¹⁰. Some are causing major environmental problems due to their toxicity. Since old, the carcinogenic and mutagenic properties of many PAHs have been studied and established, while those of several others are being investigated^11-22. Because of the pollution generated by the increasing emission of PAHs in the atmosphere, it is imperative to have methods that allow reliable identification at the same time, And precise quantification of these compounds.

The aim of this jobis to predict the retention indices for 59 PAHs, using the general molecular descriptors by genetic algorithm Mobidygs.

MATERIALS AND METHODS:

Dataset:

The values of retention indices for 59 PAHs were realized by Jujun Kang et al²³. A chemical nomenclature compounds and their corresponding retention indicesare shown in Table 1. The data set was divided into two subsets according to Kennard and Stone algorithm²⁴: 38 molecules from the training set (construction model) and 21 compounds for testing the robustness model.

Descriptor Generation:

The optimization of geometry molecule for each compound was sketched using Spartan²⁵ software by PM6 semi empirical method. The resulted files were transferred into the Dragon version 5.3²⁶, to calculate the descriptors, with elimination for each pair of correlated descriptors (with correlation coefficient r≥0.95). The Genetic Algorithm Mobidygs²⁷ has been selected the best models by maximizing the cross-validation Q²_LOO²⁸.

Kennard and Stone Algorithm:

The Kennard and Stone CADEX algorithm selected is a sequential technique that maximizes Euclidean distances between newly selected samples and those already selected. It begins by locating the two samples furthest from each other, which are removed from the original database and assigned to the calibration set.^29,30.

Model development and validation:

Simple linear regression analysis was performed with MobyDigs software¹¹, using the ordinary least squares (OLS) method.

Evaluation of the quality of the fit:

We used the following statistical parameters to assess goodness of fit

· The coefficient of determination R²:

Where is the mean value of the observed values.

· The mean square prediction deviation:

· The mean square deviation calculated on the calibration set (SDEC) :

· The mean quadratic deviation calculated on the external validation set (SDEPext)

· The predictioncoefficient:

or :

SCT: the sum of the squares of the total deviations.

PRESS: The sum of the squares of the prediction errors.

· The coefficient of external prediction calculated by the following formula:

Where, 𝑦̂i and 𝑦̂i / i are respectively the measured and predicted values (on the prediction set) the values of the dependent variable (y), and 𝑦̅ Tr the mean value of the dependent variable in the training set. The index (EXT) relates to the objects of the validation set, and the index (tr) to those of the calibration set (training set).

Wilyams Diagram:

The field of application has been discussed using the Williams diagram (treated in detail in ^{29, 31}, representing the standardized prediction residuals as a function of the values of the levers. Equation (7) defines the lever d 'a compound in the original space of independent variables.

h_i= x_i (X^T X)^-1x_i^T (i=1,…..,n) (7)

Where (xi) is the row vector of the descriptors of compound i and X (n * p) the matrix of the model deduced from the values of the descriptors of the calibration set; the index T denotes the transposed vector (or matrix).

The critical value of the leverage (h *) is set at 3 (p + 1)/ n. If hii<h *, the probability of agreement between the measured and predicted values of compound i is as high as that of calibration compounds. Compounds with hii> h * strengthen the model when they belong to the calibration set, but will otherwise have questionable predictors without necessarily being outliers, as residuals may be low.

Randomization TEST:

This test makes it possible to highlight correlations due to chance. It consists in generating a “considered property” vector by random permutation of the components of the real vector. A QSAR model is then calculated on the vector obtained (considered as a real experimental vector), according to the usual method. This process is repeated several times (100 in our case).

RESULTS AND DISCUSSION:

Simple regression model (SLR):

Our model is built with a single descriptor, which is in relation with the retention indices; this descriptor is adapted for the modeling by SLR.

The optimal model equation can be written as follows:

Ri =27.6 (±) 3.686+ 50.4 (±) 0.5716 X3 (8)

Here X3 was calculated by dragon program, it belongs to the Connectivity indices class.

All statistical parameters are shown in Table 2.

The values of R² show each time the quality of the fit; while the difference between R² and Q² is very small provide information on the robustness of the models. In addition, the similarity of SDEC and of SDEP means that the internal prediction capabilities of the models are not too dissimilar to their powers of adjustment.

Table 1. Value of Ri and X3 for a set of 59 PAHs. The last 21 chemicals are the test set.

Chemical	Ri	X3
Naphthalene	200	3.466
2-Methylnaphthalene	218.14	3.802
1-Methylnaphthalene	221.04	3.933
2-Ethylnaphthalene	236.08	4.226
2,6-Dimethylnaphthalene	237.58	4.137
1,3-Dimethylnaphthalene	240.25	4.178
1,8-Dimethylnaphthalene	249.52	4.327
2,3,6-Trimethylnaphthalene	263.31	4.723
2,3,5-Trimethylnaphthalene	265.9	4.861
Anthracene	301.69	5.344
1-Phenylnaphthalene	315.19	5.886
2-Methylanthracene	321.57	5.68
2-Methylphenanthrene	321.57	5.729
4-Methylphenanthrene	323.17	5.806
9-Methylanthracene	329.13	5.892
Chemical	Ri	X3
2-Phenylnaphthalene	332.59	5.897
9-Ethylphenanthrene	337.05	6.128
2-Ethylphenanthrene	337.5	6.153
2,7-Dimethylphenanthrene	339.23	6.065
9-Isopropylphenanthrene	345.78	6.374
1,8-Dimethylphenanthrene	346.26	6.339
9-n-Propylphenanthrene	350.3	6.263
9,10-Dimethylanthracene	355.49	6.451
9-Methyl-10-Ethylphenanthrene	359.91	6.672
1-Methyl-7-isoprppylphenanthrene	368.67	6.923
9,10-Dimethyl-3-ethylphenanthrene	381.85	7.246
Benzo(c)phenanthrene	391.39	7.285
Benzo(a)anthracene	398.5	7.278
9-Phenylphenanthrene	406.9	7.773
6-Methylbenzo(a)anthracene	417.57	7.69
1-Phenylphenanthrene	421.66	7.819
1,12-Dimethylbenzo(a)anthracene	436.82	8.189
7,12-Dimethylbenzo(a)anthracene	443.38	8.335
Dibenzo(a,c)anthracene	495.01	9.166
Dibenzo(a,h)anthracene	495.45	9.213
9-Methylbenzo(a)anthracene	416.5	7.614
Chemical	Ri	X3
12-Methylbenzo(a)anthracene	419.39	7.771
4-Methylbenzo(a)anthracene	419.67	7.751
1-Ethylnaphthalene	236.56	4.248
2,3-Dimethylnaphthalene	243.55	4.387
1,2-Dimethylnaphthalene	246.49	4.534
3,6-Dimethylphenanthrene	337.83	6.079
1-Methylbenzo(a)anthracene	414.37	7.691
3-Methylbenzo(a) anthracene	416.63	7.614
7-Methylbenzo(a)anthracene	423.14	7.832
2,7-Dimethylnaphthalene	237.71	4.137
1,7-Dimethylnaphthalene	240.66	4.276
1,6-Dimethylnaphthalene	240.72	4.269
1,4-Dimethylnaphthalene	243.57	4.414
1,5-Dimethylnaphthalene	244.98	4.406
3-Methylphenanthrene	319.46	5.736
9-Methylphenanthrene	323.06	5.798
1-Methylanthracene	323.33	5.818
1-Methylphenanthrene	323.9	5.866
9,10-Dimethyphenanthrene	367.97	6.48
11-Methylbenzo(a)anthracene	412.72	7.752
2-Methylbenzo(a)anthracene	413.78	7.621
Chemical	Ri	X3
8-Methylbenzo(a)anthracene	417.56	7.752
5-Methylbenzo(a)anthracene	418.72	7.683

The external statistical validation (Q² _EXT, SDEP) attests to the good predictive capacity of the compounds which did not participate in the calculation of the models. SDEP is little different from SDEP _EXT (the difference between SDEP and SDEP _EXT is very small 0.042 (%).

The value of Ri and X3 for a set of PAHs were reported in Table 1

It is clear that the statistics obtained for the modified vectors of the indices retention are smaller than those of the real QSPR model, which makes it possible to affirm that the proposed model is not random.

All standardized eistd residues are less than 3 standard deviation units (3s). The values of hii, ith diagonal term of the projection matrix: where is the matrix of the observed values of the explanatory variables and its transpose? The critical value for determining the leverage points corresponds to h* = = 3* 2/38 = 0.15. It can be seen that all hi's are below this critical value 0.15which means that the model has a good external productivity.

Mechanistic Interpretation:

The descriptor and its class and meaning are gathered at the Table 3.

Table 3. Selected descriptor and its meaning and class for the best GA/ SLR model

Descriptor	Definition	class
X3	Connectivity index chi-3	Connectivity indices

CONCLUSION:

In this study, we developed a useful QSAR equation that relates theoretical chemical descriptors to the indice retention of 59 HAP. For each compound 1664 descriptors (which belong to 20 classes) calculated by the Dragon software. The data set was divided into two sets of calibration and prediction, using the Kennard and Stone algorithm. Then the best descriptors were selected by Moby Dygs “genetic algorithm”. The model obtained has high statistical quality and low prediction errors. In general, it can be concluded that, for this data set, the combinations of modeling techniques result in an improvement of the linear models. The results indicate that the descriptor chosen play an important role in the indice retention of Planarbenzonoid hydrocarbons.

CONFLICT OF INTEREST:

The authors declare no conflict of interest in this reported work.

REFERENCES:

1. Samanta SK, Singh OV, Jain RK. Polycyclic aromatic hydrocarbons: Environmental Pollution and Bioremediation. TRENDS in Biotechnology. 2002; 20(6):243-8.

2. Hylton JH. Aboriginal Self-Government in Canada: Current Trends and Issues. Purich's Aboriginal Issues Series: ERIC. 1994.

3. Juhasz AL, Naidu R. Bioremediation of high molecular weight polycyclic aromatic hydrocarbons: a review of the microbial degradation of benzo [a] pyrene. International Biodeterioration and Biodegradation. 2000; 45(1-2):57-88.

4. Wilcke W. Synopsis polycyclic aromatic hydrocarbons (PAHs) in soil—a review. Journal of Plant Nutrition and Soil Science. 2000; 163(3): 229-48.

5. Hill AJ, Ghoshal S. Micellar solubilization of naphthalene and phenanthrene from nonaqueous-phase liquids. Environmental science & technology. 2002; 36(18):3901-7.

6. Li JL, Chen BH. Solubilization of model polycyclic aromatic hydrocarbons by nonionic surfactants. Chemical Engineering Science. 2002; 57(14):2825-35.

7. Menzie CA, Potocki BB, Santodonato J. Exposure to carcinogenic PAHs in the environment. Environmental Science & Technology. 1992; 26(7):1278-84.

8. Rababah A, Matsuzawa S. Treatment system for solid matrix contaminated with fluoranthene. II––Recirculating photodegradation technique. Chemosphere. 2002; 46(1):49-57.

9. Gabet S. Remobilisation d'Hydrocarbures Aromatiques Polycycliques (HAP) présents dans les sols contaminés à l'aide d'un tensioactif d'origine biologique. 2004.

10. Bernal-Martinez A. Elimination des hydrocarbures aromatiques polycycliques présents dans les boues d'épuration par couplage ozonation-digestion anaérobie. 2005.

11. dos Santos Duarte C, dos Santos Marques A, Takahata Y. QSAR study of quinoline metanols with antimalarial activity against Plasmodium falciparum.

12. Verma MP, Gupta S. Software Development for Nursing: Role of Nursing Informatics. Int J Nur Edu and Research. 2017; 5(2):203-7.

13. Galande AK, Rohane SH. Insilico Molecular docking analysis in Maestro Software. Asian Journal of Research in Chemistry. 2021; 14(1):97-100.

14. Ganatra SH, Patle MR, Bhagat GK. Studies of Quantitative Structure-Activity Relationship (QSAR) of Hydantoin Based Active Anti-Cancer Drugs. Tc. 2011; 1(2.124):8-9351.

15. Swathi N, Subrahmanyam CVS, Satyanarayana K. Synthesis and Quantitative Structure-Antioxidant Activity Relationship Analysis of Thiazolidine-2, 4-dione Analogues. Asian J Research Chem. 2015; 8(1):21-6.

16. Otuokere IE, Amaku FJ. Computer-aided drug design of an anti-angiogenic and immunomodulatory agent,-(2, 4-dioxocyclohexyl)-1H-isoindole-1, 3 (2H)-dione (thalidomide). Asian Journal of Research in Chemistry. 2015; 8(9):601-5.

17. Mathew B, Mathew GE, Shafeer VP, Musthafa CM, Femina P. A Green Route Approach of α, β-Unsaturated Ketone Having a Benzimidazole Tail and Their Virtual Screening on the Molecular Descriptors for Predicting the CNS-Drug likeness. Asian Journal of Research in Chemistry. 2012; 5(1):65-8.

18. Otuokere IE, Alisa CO. Computational Study on Molecular Orbital’s, Excited State Properties and Geometry Optimization of Anti-benign Prostatic Hyperplasia Drug, N-(1, 1-dimethylethyl)-3-oxo-(5α, 17β)-4-azaandrost-1-ene-17-carboxamide (Finasteride). Asian Journal of Research in Pharmaceutical Science. 2014; 4(4): 169-73.

19. Otuokere IE, Amaku FJ. Conformation Analysis and Self-Consistent Field Energy of Immune Response Modifier, 1-(2-methylpropyl)-1H-imidazo [4, 5] quinolin-4-amine (Imiquimod). Asian Journal of Research in Pharmaceutical Science. 2015; 5(3):1-6.

20. Otuokere IE, Amaku FJ, Alisa CO. In silico geometry optimization, excited–state properties of (2E)-N-Hydroxy-3-[3-(Phenylsulfamoyl) Phenyl] prop-2-enamide (Belinostat) and its molecular docking studies with Ebola Virus glycoprotein. Asian Journal of Pharmaceutical Research. 2015; 5(3):131-7.

21. Yadav M, Yadav VK. A Study of Challenges and Practices Related to HRM in Software Industry. Asian Journal of Management. 2017; 8(4):1233-6.

22. Gujjar PJ, Manjunatha T. Testing of Dupont model for software and Training services Companies in India. Asian Journal of Management. 2021; 12(2):169-80.

23. Kang J, Cao C, Li Z. Quantitative structure–retention relationship studies for predicting the gas chromatography retention indices of polycyclic aromatic hydrocarbons: Quasi-length of carbon chain and pseudo-conjugated system surface. Journal of Chromatography A. 1998; 799(1-2):361-7.

24. Kennard RW, Stone LA. Computer aided design of experiments. Technometrics. 1969; 11(1):137-48.

25. Zakharian TY, Coon SR. Evaluation of Spartan semi-empirical molecular modeling software for calculations of molecules on surfaces: CO adsorption on Ni (111). Computers & Chemistry. 2001; 25(2):135-44.

26. Todeschini R, Consonni V, Mauri A, Pavan M. DRAGON-Software for the calculation of molecular descriptors. Web Version. 2004; 3.

27. Leardi R, Boggia R, Terrile M. Genetic algorithms as a strategy for feature selection. Journal of Chemometrics. 1992; 6(5):267-81.

28. Todeschini R, Ballabio D, Consonni V, Mauri A, Pavan M. MOBYDIGS, Software for Multilinear Regression Analysis and Variable Subset Selection by Genetic Algorithm. Release 1.1 for windows. Milano; 2009.

29. Tropsha A, Gramatica P, Gombar VK. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR & Combinatorial Science. 2003; 22(1):69-77.

30. Wu W, Walczak B, Massart DL, Heuerding S, Erni F, Last IR, et al. Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometrics and Intelligent Laboratory Systems. 1996; 33(1):35-46.

31. Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs. Environmental Health Perspectives. 2003; 111(10):1361-75.

Received on 30.11.2022 Modified on 25.05.2023

Asian J. Research Chem. 2023; 16(5):358-362.

DOI: 10.52711/0974-4150.2023.00057

*n_tr*	*n_ext*	*Q²_LOO*	R²	*Q²_LMO/50*	*SDEC*	*SDEP*	*SDEP_ext*	S	F	*Q²_BOOT*	*R²_adj*	*Q²_ext*
38	21	99.49	99.54	99.47	5.113	5.36	5.036	5.2527	7773.96	99.45	99.53	99.6