Quasi-SMILES based Nano-QSAR model to predict the cytotoxicity of


Quasi-SMILES based Nano-QSAR model to predict the cytotoxicity of...

0 downloads 36 Views 2MB Size

Article pubs.acs.org/crt

Cite This: Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Quasi-SMILES-Based Nano-Quantitative Structure−Activity Relationship Model to Predict the Cytotoxicity of Multiwalled Carbon Nanotubes to Human Lung Cells Tung Xuan Trinh,† Jang-Sik Choi,‡ Hyunpyo Jeon,§ Hyung-Gi Byun,‡ Tae-Hyun Yoon,*,† and Jongwoon Kim*,§,⊥ †

Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea Division of Electronics, Information and Communication Engineering, Kangwon National University, Samcheok, Kangwon-do 24341, Republic of Korea § Environmental Safety Group, Korea Institute of Science and Technology (KIST) Europe, Campus E 7.1, D-66123 Saarbruecken, Germany ‡

S Supporting Information *

ABSTRACT: Quantitative structure−activity relationship (QSAR) models for nanomaterials (nano-QSAR) were developed to predict the cytotoxicity of 20 different types of multiwalled carbon nanotubes (MWCNTs) to human lung cells by using quasiSMILES. The optimal descriptors, recorded as quasi-SMILES, were encoded to represent the physicochemical properties and experimental conditions for the MWCNTs from 276 data records collected from previously published studies. The quasi-SMILES used to build the optimal descriptors were (i) diameter, (ii) length, (iii) surface area, (iv) in vitro toxicity assay, (v) cell line, (vi) exposure time, and (vii) dose. The model calculations were performed by using the Monte Carlo method and computed with CORAL software (www.insilico.eu/coral). The quasi-SMILES-based nano-QSAR model provided satisfactory statistical results (R2 for internal validation data sets: 0.60−0.80; R2pred for external validation data sets: 0.81−0.88). The model showed potential for use in the estimation of human lung cell viability after exposure to MWCNTs with the following properties: diameter, 12−74 nm; length, 0.19−20.25 μm; surface area, 11.3−380.0 m2/g; and dose, 0−200 ppm.



INTRODUCTION Nanomaterials have been developed for use in many industrial and consumer applications. Owing to their increased use, the toxicity of nanoparticles to the environment and organisms has been extensively investigated. The assessment of nanomaterial toxicity has become important to governments and also to industry, as chemical regulations, such as Registration, Evaluation, Authorization and Restriction of Chemicals (REACH), require manufacturers to evaluate the safety of nanomaterials before their products enter the European market.1 Many studies on nanomaterial toxicity have used in silico approaches (i.e., literature data mining, machine learning, and modeling).2−13 Nanotoxicity prediction models that use physicochemical properties of nanoparticles as descriptors are traditional and widespread. Oh et al.2 conducted a meta-analysis on the in vitro toxicity data of quantum dots and used random forest regression for the prediction of cell viability and IC50 (the exposure concentration at which there is 50% (or more) inhibition of cell growth or other toxicity metric) for quantum dots and reported an R2 value of 0.68−0.88. They found that toxicity was closely correlated with the surface properties of the quantum dot. On the basis of experimental data, Puzyn et al.7 suggested a linear model for the prediction of the effective concentration of © XXXX American Chemical Society

a compound that caused a 50% reduction (EC50) in bacterial viability by using the descriptor ΔHMe+ (enthalpy of formation of a gaseous cation with the same oxidation state as in the metal oxide structure) of 17 metal oxide nanoparticles. The model provided satisfactory statistical quality (R2 = 0.85). Liu et al.8 developed highly accurate (>95%) toxicity classification models (logistic regression) for the toxicity prediction of nine metal oxides to human bronchial epithelial cells (BEAS-2B) by using 14 physicochemical parameters of nanoparticles. Research on toxicity prediction models for nanomaterials9−21 has demonstrated the feasibility of the prediction of nanoparticle toxicity from their physicochemical properties. However, the prediction models in these studies had two major limitations: the number of descriptors and the effect of the descriptors on toxicity that is directly extracted from models. For example, the prediction model suggested by Puzyn et al.7 used only one descriptor (ΔHMe+) to predict the effective concentration (EC50); therefore, the model could not explain how the other physicochemical properties of metal oxide nanoparticles contributed to their toxicity. Toxicity classificaReceived: November 9, 2017 Published: February 14, 2018 A

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology Table 1. MWCNT Cell Viability Data article Pubmed ID

mean diameter (nm)

mean length (nm)

surface area (m2/g)

toxic assay method

exposure time (h)

dose (ppm)

cell viability (%)

BEAS-2B WI-38

72 24, 48, 72

0−200 0−200

60−100 2.3−100

6 108

HBE, BEAS2B 16HBE14oBEAS-2B HBE BEAS-2B BEAS-2B HBE BEAS-2B

24

0−50

22−100

6

0−125 0−190 5−24 5−40 5−200 0−100 5−100

11.7−104 26−120 72−100 87.6−103.8 16.7−100 69.2−100 40.7−103.1

cell line

19121628 20954078

30.00 12.5

17500 192, 5498

380.00 195.29, 177.57

23648666

60.00

10000

27.50

WST LDH, WST, Trypan Blue Alamar Blue

24405247 24438343 24915862 25147797 25343289 26090445 26370214

28.26−58.66 12−74 20.00 32.00 15.50 60.00 32.00

700−1950 400−5700 20250 3935 8000 10000 3935

11.3−243.4 18−254 109.29 106.70 233.00 27.50 106.70

CellTiter-Glo Trypan Blue LDH, Trypan Blue WST Trypan Blue, WST Alamar Blue WST

48 4, 24, 48 24 24 24 24 24

no. rows

56 60 8 4 13 3 12 total: 276

Table 2. Rules for Quasi-SMILES Coding of Physicochemical Properties and Doses of MWCNTs Based on the Hierarchical Clustering Method mean diameter (nm)

code

mean length (nm)

code

surface area (m2/g)

code

dose (ppm)

code

12.0−15.0 15.1−19.0 19.1−25.0 25.1−35.0 35.1−40.0 40.1−45.0 45.1−50.0 50.1−60.0 60.1−65.0 65.1−74.0

A0 A1 A2 A3 A4 A5 A6 A7 A8 A9

190−500 501−900 901−1500 1501−2000 2001−5000 5001−7000 7001−9000 9001−10000 10001−18000 18001−20250

B0 B1 B2 B3 B4 B5 B6 B7 B8 B9

11.0−25.0 25.1−40.0 40.1−50.0 50.1−80.0 80.1−110.0 110.1−150.0 151.1−200.0 200.1−250.0 250.1−300.0 300.1−380.0

C0 C1 C2 C3 C4 C5 C6 C7 C8 C9

0−5.0 5.1−16.0 16.1−25.0 25.1−40.0 40.1−50.0 50.1−65.0 65.1−100.0 100.0−125.0 125.1−150.0 150.1−200.0

G0 G1 G2 G3 G4 G5 G6 G7 G8 G9

tion models reported by Liu et al.3,8,12 and regression models by Oh et al.2 used more than ten descriptors for toxicity prediction and provided suggestions about descriptors that produced higher-performance models. On the basis of these suggestions, they indicated the descriptors that were closely related to toxicity indirectly. Instead of using traditional descriptors (i.e., physicochemical properties and molecular information), a new approach using eclectic information as descriptors to predict the toxicity of organic and nanoscale materials was developed by Toropova and Toropov through many publications.18−36 In this approach, physicochemical properties and the exposure conditions of nanoparticles are represented by so-called “quasi-SMILES”, which are characterbased representations derived from traditional SMILES.38 A data sample that consisted of the physicochemical properties and the exposure conditions of nanoparticles can be represented by a series of characters. Through the application of Monte Carlo optimization, the optimal descriptors, which are the sum of weights of the quasi-SMILES, were calculated and used to predict the toxicity of the nanomaterials. Models built by this approach contain all the physicochemical properties and exposure conditions of the nanoparticles and have the capability to show the direct effect of descriptors on toxicity. Thus, a main advantage of the quasi-SMILES is to increase the number of available data sets by using the eclectic data in developing Nano-QSARs. Nevertheless, the QSAR models for nanomaterials employed in these publications23,24,26,30,33−36,38,39 were limited in terms of data quantity and external validation; the number of data samples was generally small (several tens of data rows), and the largest number of data rows was 109.39 Other studies20,21,27,28,36,37 had

only a few tens of data rows. Furthermore, in those papers, the external validation of the prediction models was conducted by using a random fraction of the data set and models could only predict the toxicity for nanomaterials with the same physicochemical properties as the materials in the training data set. The practice of random division to obtain the external validation data set is not representative of the “real world” prediction of the toxicity of a new nanomaterial that was not included in the training data set. In addition, prediction models for viability response of human lung cells exposed to MWCNT have not yet been developed. The objective of this study was to develop a nano-QSAR model, based on quasi-SMILES, to predict the cytotoxicity of different MWCNTs to human lung cells. In this study, we developed and applied a range of quasi-SMILES codes by using a hierarchical clustering technique to improve the applicability domain of the nano-QSAR model for the coverage of 20 types of MWCNTs. Additionally, external validation of prediction models was strictly conducted that new data absent from training data sets was used to test the models.



MATERIALS AND METHODS

Data Collection. The cell viability data for normal human lung cells were extracted from the Safe & Sustainable Nanotechnology (S2NANO) database (www.s2nano.org). The data was originated from 10 research articles40−49 about the in vitro toxicity of MWCNTs. A summary of the data set is presented in Table 1. Four types of normal human lung cells (BEAS-2B, 16HBE14o-, WI-38, and HBE) were exposed to various types of nonfunctionalized MWCNTs with different physicochemical properties with respect to the diameter, length, and specific surface area. The exposure times were 4, 24, 48, and 72 h; the MWCNT concentrations ranged from 0 to 200 ppm. Six B

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology different toxicity assay methods were used to measure the cell viability of the human lung cells with the results expressed as a percentage compared with control samples. In total, the data set consisted of 276 data rows. The cell viability was used as an end point, and seven parameters (diameter, length, specific surface area, cell line, toxic assay method, exposure time, and dose) were used to construct “optimal descriptors” by using Monte Carlo optimization50 and computed by CORAL software (http://www.insilico.eu/coral). Data Preprocessing. The Pearson correlation coefficients between each pair of numeric parameters (diameter, length, specific surface area, dose, and cell viability) were calculated to check whether there was a special correlation between those parameters. The correlation coefficients are shown in Figure S1. The highest correlation coefficient was of specific surface area versus diameter (−0.71) followed by dose versus cell viability (−0.5). Those were trivial correlations because smaller nanomaterials have a larger specific surface area, and a higher dose of nanomaterials may lead to a higher number of dead cells. For constructing the “optimal descriptors”, the seven parameters were converted to quasi-SMILES form. The quasi-SMILES consisted of seven components: four were numerical data (diameter, length, surface area, and dose) for the MWCNTs, as shown in Table 2, and the remaining three were categorical data (cell line, toxic assay method, and exposure time), as shown in Table 3.

The ratios of training/validation (including internal and external validation sets) were 8:2 and 6:4. For the external validation, fixed 21 data rows of three MWCNTs with diameters of 46.76, 56.24, and 58.66 nm were used because those data were within the domain of the data set that comprised the remaining data (the other 17 MWCNTs). The remaining data (255 rows) were randomly divided into the training and internal validation sets, and the range of data in the training and internal validation sets was similar. The division process was repeated six times to gain six different splits that consisted of training and internal validation sets. Results of this process were six different splits of which splits 1−3 had a training/validation ratio of 6:4 and splits 4−6 had a ratio of 8:2. Building Cell Viability Prediction Models. Instead of the direct use of physicochemical properties and exposure conditions as descriptors, optimal descriptors calculated from quasi-SMILES by means of Monte Carlo optimization were used to predict cell viability. The detailed steps of Monte Carlo optimization for quasi-SMILES were described by Toropov et al.39,50 The idea is summarized below: a. The series of characters in quasi-SMILES is replaced by a random sequence of numbers. These numbers are the so-called “correlation weight” (CW). b. The optimal descriptors (DCW) are calculated as

DCW =

∑ CW(Sk)

(1) th

where CW(Sk) is the correlation weight of the k parameter constructing the quasi-SMILES (in this work, there are seven parameters as mentioned above). c. The correlation coefficient between a target end point and the optimal descriptors is then calculated. It is a mathematical function of the correlation weights and two parameters of the optimization. d. The correlation weights are then modified to maximize the correlation coefficients. e. Each cycle of modifying the correlation weights and calculating the correlation coefficients is called an epoch (N). f. A coefficient called T (threshold)31 is used to classify all features into two classes: noise, if the number of a CW(Sk) in the training set is less than T, and active if the number of a CW(Sk) in the training set is larger than T (or equal to T). The noise features are blocked: their correlation weights are defined equal to zero. g. The Monte Carlo optimization process will find the values of threshold T* and epoch N* that allows the correlation coefficients to reach their maximum. The prediction model for cell viability is described by the equation

Table 3. Rules for Quasi-SMILES Coding Categorical Data of MWCNTs Based on the Hierarchical Clustering Method toxic assay method

code

cell line

code

Alamar Blue CellTiter-Glo LDH MTT Trypan Blue WST

D0 D1 D2 D3 D4 D5

16HBE14oBEAS-2B HBE WI-38

E0 E1 E2 E3

exposure time 4 24 48 72

h h h h

code F0 F1 F2 F3

A binary combination of alphabet characters and numbers (i.e., A0, A1,···) was used to code the quasi-SMILES. For the categorical data, the number of unique values in each parameter was less than 10; therefore, their quasi-SMILES representations could be coded by assigning a number between zero and nine in a single character. However, for the numerical data with more than 10 unique values, a hierarchical clustering technique51 was used to support the grouping of the numerical data, including diameter, length, surface area, and dose. The data points of each parameter were then grouped into 10 different groups based on the Euclidian distance between them, as shown in Figure S2. The characters A−G were assigned to diameter, length, surface area, dose, toxic assay method, cell line, and exposure time, respectively. Categories 0, 1,···, 5 were assigned different unique values for the toxic assay method, cell line, and exposure time based on alphabetic order. Categories 0, 1,···, 9 were assigned to 10 ranges of numerical data based on the results of the hierarchical clustering. Toropova et al.36,37 assigned categories 0, 1,···, 9 to 10 ranges of numerical data by using a normalization method. However, the normalization approach is limited because some binary combinations of characters and numbers may be absent in the available data sets.36 For the case of a new MWCNT with properties (diameter, length, and surface area) within the domain of the training data set but without the quasiSMILES code of that data, the prediction models cannot predict end points for the new MWCNT. Consequently, the hierarchical clustering approach can be employed to overcome the limitation of the normalization method as well as to increase the applicability of the predictive models. The hierarchical clustering approach can consider all the binary combinations that are present in working data sets so that more unknown samples within the domain of training data sets can be predicted by models. Model Development. Data Set Splitting. The data set was divided into training, internal validation, and external validation sets.

cell viability (%) = C0 + C1 × DCW(T *, N *)

(2)

where C0 and C1 are the two parameters of the mathematical function mentioned above. The training data set was used to construct the prediction model; the data from the internal validation were used to check the model, and the data from the external validation were used to test the applicability of the model. All calculations were computed by CORAL software (http://www.insilico.eu/coral).



RESULTS Optimal Descriptors. The calculated correlation weights (CW) of all quasi-SMILES codes for the six splits are shown in Tables S1−S6. Each quasi-SMILES code has its own correlation weight. The optimal descriptors for each data row were calculated from eq 1. For new data from MWCNTs (i.e., physicochemical properties, cell line, toxic assay method, exposure time, and dose) that were within the range of the data set in this study, the prediction models were able to predict the cell viability response. The average correlation weights of all attributes (dose, length, surface area, dose, toxic assay method, cell line, and exposure time), which are calculated from the average correlation weights of all quasi-SMILES codes of each C

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology

Cell Viability Prediction Models for MWCNTs. The cell viability prediction models for the six splits are given below (CV: cell viability %) (Figure 2): Split 1:

parameter, are shown in Figure 1. The dose attribute has the lowest average correlation weight, followed by exposure time, length, toxic assay method, surface area, diameter, and cell line, respectively.

CV = − 819.85(± 3.14) + 124.14(± 0.44) × DCW(1, 5) (3)

R2train = 0.7685, ntrain = 167; R2internal = 0.6823, ninternal = 88; R2external = 0.8617, nexternal = 21 Split 2: CV = − 1206.09(± 3.88) + 183.35(± 0.56) × DCW(1, 11)

(4)

R2train = 0.7758, ntrain = 167; R2internal = 0.6791, ninternal = 88; R2external = 0.8174, nexternal = 21 Split 3: CV = − 753.27(± 2.62) + 115.70(± 0.37) × DCW(1, 6) (5) 2

R2internal

R train = 0.7919, ntrain = 166; R2external = 0.8554, nexternal = 21 Split 4:

= 0.6038, ninternal = 89;

CV = − 2580.11(± 6.56) + 379.82(± 0.94) × DCW(2, 4) (6) 2

R2internal

R train = 0.7384, ntrain = 220; R2external = 0.8522, nexternal = 21 Split 5:

Figure 1. Average correlation weights (CW) of all attributes, which are calculated from the average correlation weights of all quasi-SMILES codes for each parameter in all six splits. The parameters are in order of increasing CW values. Error bars represent the standard deviation.

= 0.7845, ninternal = 35;

CV = − 912.28(± 2.50) + 139.78(± 0.35) × DCW(1, 4) (7)

Figure 2. Cell viability prediction results for six splits. D

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology

Table 6. Y-Randomization Results for Internal Validation Data Sets of Six Splitsa

R2train = 0.7439, ntrain = 220; R2internal = 0.7637, ninternal = 35; R2external = 0.8797, nexternal = 21 Split 6:

Iteration

Split 1

Split 2

Split 3

Split 4

Split 5

Split 6

1 2 3 4 5 6 7 8 9 10

0.0082 0.0022 0.0000 0.0115 0.0096 0.0013 0.0001 0.0005 0.0449 0.0015

0.0553 0.0078 0.0051 0.0162 0.0061 0.0364 0.0029 0.0017 0.0094 0.0360

0.0052 0.0095 0.0033 0.0101 0.0032 0.0033 0.0496 0.0000 0.0110 0.0003

0.0002 0.0282 0.156 0.0059 0.0146 0.0309 0.0001 0.1445 0.0000 0.1109

0.0005 0.0014 0.0198 0.0035 0.0556 0.0443 0.0218 0.0608 0.0057 0.0000

0.0031 0.0037 0.0001 0.0421 0.1089 0.021 0.0015 0.0014 0.0046 0.0011

CV = − 647.33(± 1.81) + 101.66(± 0.25) × DCW(1, 4) (8)

R2train = 0.7342, ntrain = 219; R2internal = 0.8033, ninternal = 36; R2external = 0.8513, nexternal = 21 The statistical parameters of these prediction models are presented in Table 4. Table 4. Details of the Six Cell Viability Prediction Modelsa R2train ntrain strain R2internal ninternal sinternal R2external (R2pred) nexternal sexternal Q2LOO Q2F2 R m2

split 1

split 2

split 3

split 4

split 5

split 6

0.7685 167 13.7 0.6823 88 18.2 0.8617

0.7758 167 13.7 0.6791 88 19.6 0.8174

0.7919 166 13.8 0.6038 89 18.2 0.8554

0.7384 220 15.2 0.7854 35 14.1 0.8522

0.7439 220 15.0 0.7637 35 15.1 0.8797

0.7342 219 14.8 0.8033 36 16.0 0.8513

21 12.8 0.6693 0.8347 0.5376

21 11.0 0.6659 0.7801 0.5698

21 16.4 0.5867 0.8316 0.5183

21 8.8 0.7597 0.8275 0.6524

21 8.0 0.7316 0.8559 0.6737

21 13.4 0.7795 0.8155 0.6330

a

Values are randomized R2 obtained after randomly shuffling Y values (cell viability) and fixing X values (DCW).

MWCNTs are applicable to the prediction models in this work if the following two conditions are satisfied: a. First, the data are within the data range of this data set. b. Second, the data have a defect32 value that is smaller than twice the average defect of the data set used in this study. Toporova et al.32 suggested the concept of “defect” to the applicability domain of quasi-QSAR models. The defect of quasi-SMILES is defined as the sum of the defects of each quasi-SMILES component

a

R is correlation coefficient; n is the number of compounds in the set; s is standard error of estimation; Q2LOO is leave-one-out cross-validated correlation coefficient; Q2F2 is the cross-validated correlation coefficient for the external validation data set; Rm2 is the average of Rm2′ (used the observed response values in the y-axis and predicted values in the x-axis); and Rm2″ (used the observed response values in the x-axis and predicted values in the y-axis).

defect (quasiSMILES) =

defect(Sk) =

split 1

split 2

split 3

split 4

split 5

split 6

0.0007 0.0094 0.0151 0.0005 0.0261 0.0011 0.0023 0.0170 0.0126 0.0167

0.0029 0.0052 0.0013 0.009 0.000 0.000 0.0023 0.0012 0.0124 0.0031

0.0097 0.0031 0.0015 0.0006 0.0028 0.0000 0.0075 0.0035 0.0032 0.0013

0.004 0.0009 0.0001 0.0032 0.0063 0.0001 0.0004 0.0000 0.0271 0.0072

0.0048 0.0137 0.0000 0.0015 0.0014 0.0004 0.0042 0.0001 0.0004 0.0024

0.0031 0.0063 0.0016 0.0057 0.0124 0.0001 0.0058 0.0001 0.0004 0.0045

|Ptrain(Sk) − Pinternal(Sk)| Ntrain(Sk) + Ninternal(Sk)

(10)

where Ptrain(Sk) is the probability of the presence of Sk in the training data set (Ptrain(Sk) = Ntrain(Sk)/Ntrain) and Pinternal(Sk) is the probability of the presence of Sk in the internal validation data set (Pinternal(Sk) = Ninternal(Sk)/Ntrain). Ntrain(Sk) and Ninternal(Sk) are the frequencies of Sk in the training and internal validation data sets, respectively. The ideal situation is that the quasi-SMILES defect calculated in eq 9 is equal to zero; however, this rarely occurs. As suggested by Toporova et al.,32 a limitation for the values of the quasi-SMILES defect in which it is equal to twice the average defect was used to the test applicability of models to new data

Table 5. Y-Randomization Results for Training Data Sets of Six Splitsa 1 2 3 4 5 6 7 8 9 10

(9)

where Sk is the quasi-SMILES component (i.e., A0, B1,···). The defect of each quasi-SMILES component is calculated from

Y-randomization was performed for all six splits, and the results are shown in Tables 5 and 6.

iteration

∑ defect(Sk)

defect new quasiSMILES < 2 × defect quasiSMILES (11)

If the new data satisfy inequality (11), then the prediction models will be applicable. As shown in Tables S7 and S8, some quasi-SMILES codes of the external validation data set were not applicable to the prediction models built from the six splits. The percentages of external validation data that were applicable to the prediction models were 100, 100, 100, 86, 100, and 100% for splits 1−6, respectively. This indicated that the six prediction models were able to predict more than 4/5 of the new data, which was satisfactory. Effects of Attributes to Cell Viability. The six models tested for the estimation of the cell viability of MWCNTs, as described in the Results, showed that a lower value of the optimal descriptor (DCW) led to the estimation of a lower value of cell viability (i.e., higher toxicity). The optimal descriptor was the sum of all the correlation weights (CW)

a Values are randomized R2 obtained after randomly shuffling Y values (cell viability) and fixing X values (DCW).



DISCUSSION Applicability Domain. The concept of an applicability domain was suggested by Netzeva et al.:52 “Applicability domain of QSAR models is defined as their scope and limitation, i.e. the range of chemical structures for which the model is considered to be applicable”. In this study, the range of physicochemical properties and the conditions of exposure to the MWCNTs were considered. New input data from E

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology

were absent from the training data set. The external validation data set we used in this study contained data that were absent from the training data set, but our models were able to predict end points for those data with satisfactory accuracy. Although the hierarchical clustering approach helped to compensate for the limitation of the normalization method, as mentioned in the Materials and Method, the results of hierarchical clustering might vary depending on the data sets. Thus, the range codes of quasi-SMILES shown in Table 2 could be altered depending on data sets. Therefore, the weighting of the quasi-SMILES codes would be different between data sets. Consequently, if more experimental data is generated and included with the data of this work, new quasi-SMILES coding tables and new prediction models will be required. Considering the current scarcity of in vitro toxicity data, hierarchical clustering is preferable to the normalization approach for the coding of quasi-SMILES.

of the quasi-SMILES codes. Thus, we expected that higher toxicity values would be derived from quasi-SMILES codes with lower correlation weights. Among the seven parameters (diameter, length, surface area, dose, toxic assay method, cell line, and exposure time), dose had the lowest average correlation weight (CW) as shown in Figure 1. On the basis of eqs 1−5, one can say that lower CW results in lower cell viability and thus higher toxicity because positive CW indicates the promotion of an increase in cell viability. This implied that dose is the most influential factor for increasing the toxicity of MWCNTs. This agreed with Pearson correlation analysis that indicated the dose has the highest correlation to cell viability as shown in Figure S1. On the basis of the correlation weight tables (Tables S1−S6), the correlation weights of high dose group G4−G9 (less than 1.0) are smaller than the weights of lower dose group G0−G3 (larger than 1.0). This indicated that the higher doses (G4−G9) exerted more toxic effects than the lower doses (G0−G3). Exposure time is the parameter having the second smallest average value of DCW, implying that it also has an important effect on cell viability. This agrees with the fact that higher exposure time would cause lower cell viability or higher toxicity. Among the PChem parameters of diameter, length, and surface, length has the lowest average weighting (shown in Figure 1). This may suggest that the length of MWCNTs was an important factor in the determination of their toxicity. Advantages and Drawbacks of Prediction Models. The end point in this study is “cell viability” of human lung normal cell lines exposed to various nonfunctionalized MWCNTs. The “cell viability” is the percentage of alive cells after exposure to MWCNTs under certain conditions. In the 10 articles we collected, the authors conducted experiments that four types of human normal lung cells were exposed to 20 different types of MWCNTs (each type had specific physicochemical properties of diameter, length, and surface area) under different conditions (time, dose, and assay). Six toxic assays used in these 10 articles (MTT, WST, LDH, CellTiter-Glo, Alamar Blue, and Trypan Blue) were widely used and recognized in in vitro toxicity research for cell viability measurements. The “cell viability” was the experimental observation resulting from those properties and conditions. Thus, the end point used in this study maintained the transparency for model validations, and mixing of data from 10 articles did not cause diversity of the end point. The six prediction models built in this study were validated by an external validation data set that contained completely new data from the training and internal validation data sets. The external validation results showed a good prediction ability with R 2 pred between 0.81 and 0.88. Previous studies19−21,24,25,27,29,33,35−37,39,50,53−60 that used the quasi-SMILES approach for toxicity prediction only used external validation data obtained by random division of the total data set. Compared with this study, our models could simulate the realworld situation that occurs in the cases of completely new data and use the models for the prediction of end points. In this study, we used ranges to code for quasi-SMILES, i.e., each quasi-SMILES code represented a range of data. This is an advantage in terms of the external application for prediction models. For new data that is within the range of the training data (i.e., the applicability domain), the models can be used to predict cytotoxicity. Previous research on quasi-SMILES used certain codes for fixed values of data, and therefore, the models could not predict an end point for new input data if the data



CONCLUSIONS In this study, quasi-SMILES-based prediction models for cell viability were built with satisfactory performance. The models could deal with the real world situation that they were able to give a prediction for new input data of MWCNT, which is absent from training data. The hierarchical clustering method was successfully applied to compensate disadvantages of previous quasi-SMILES-based prediction models that some quasi-SMILES codes were absent, and models were unable to predict the end point for completely new data. The prediction models in this work provided transparent interpretation through model equations and correlation weights calculated from Monte Carlo optimization.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.chemrestox.7b00303. Data set used in this study (XLSX) Correlation weights of each quasi-SMILES code in splits 1−6, calculated defects for data in the external validation data sets of splits 1−6, Pearson correlation check for numerical data of the MWCNT data set, and hierarchical clustering for four numeric parameters (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Phone: +82-2-2220-4593. *E-mail: [email protected]. Phone: +49-(0)681-9382322. ORCID

Tung Xuan Trinh: 0000-0002-8961-2876 Tae-Hyun Yoon: 0000-0002-2743-6360 Present Address ⊥

J.K.: Center for Chemical Safety and Security, Korea Research Institute of Chemical Technology (KRICT), Gajeong-ro, Yuseong-gu, Daejeon 34114, Republic of Korea

Author Contributions

T.X.T and J.S.C prepared and cross-checked the data sets for model development. T.X.T, J.S.C, H.G.B, J.K, and T.H.Y contributed in the model development and interpretation. F

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology

Dimensions and Impurities Affect the Toxicity of Carbon Nanotubes. Risk Anal. 34, 583−597. (12) Liu, R., Rallo, R., Weissleder, R., Tassa, C., Shaw, S., and Cohen, Y. (2013) Nano-SAR Development for Bioactivity of Nanoparticles with Considerations of Decision Boundaries. Small 9, 1842−1852. (13) Fourches, D., Pu, D., Tassa, C., Weissleder, R., Shaw, S. Y., Mumper, R. J., and Tropsha, A. (2010) Quantitative Nanostructure Activity Relationship Modeling. ACS Nano 4, 5703−5712. (14) Chau, Y. T., and Yap, C. W. (2012) Quantitative NanostructureActivity Relationship Modelling of Nanoparticles. RSC Adv. 2, 8489− 8496. (15) Apul, O. G., Wang, Q., Shao, T., Rieck, J. R., and Karanfil, T. (2013) Predictive Model Development for Adsorption of Aromatic Contaminants by Multi-Walled Carbon Nanotubes. Environ. Sci. Technol. 47, 2295−2303. (16) Oksel, C., Ma, C. Y., and Wang, X. Z. (2015) Structure-Activity Relationship Models for Hazard Assessment and Risk Management of Engineered Nanomaterials. Procedia Eng. 102, 1500−1510. (17) Singh, K. P., and Gupta, S. (2014) Nano-QSAR Modeling for Predicting Biological Activity of Diverse Nanomaterials Academy of Scientific and Innovative Research, India Environmental Chemistry Division, CSIR-Indian Institute of Toxicology, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001. (18) Kleandrova, V. V., Luan, F., González-Díaz, H., Ruso, J. M., Melo, A., Speck-Planche, A., and Cordeiro, M. N. D. S. (2014) Computational Ecotoxicology: Simultaneous Prediction of Ecotoxic Effects of Nanoparticles under Different Experimental Conditions. Environ. Int. 73, 288−294. (19) Toropova, A. P., Toropov, A. A., Veselinović, A. M., Veselinović, J. B., Leszczynska, D., and Leszczynski, J. (2016) Monte Carlo−based Quantitative Structure−activity Relationship Models for Toxicity of Organic Chemicals to Daphnia Magna. Environ. Toxicol. Chem. 35, 2691−2697. (20) Manganelli, S., Leone, C., Toropov, A. A., Toropova, A. P., and Benfenati, E. (2016) QSAR Model for Predicting Cell Viability of Human Embryonic Kidney Cells Exposed to SiO2 Nanoparticles. Chemosphere 144, 995−1001. (21) Toropov, A. A., and Toropova, A. P. (2015) Quasi-QSAR for Mutagenic Potential of Multi-Walled Carbon-Nanotubes. Chemosphere 124, 40−46. (22) Veselinović, J. B., Veselinović, A. M., Toropova, A. P., and Toropov, A. A. (2016) The Monte Carlo Technique as a Tool to Predict LOAEL. Eur. J. Med. Chem. 116, 71−75. (23) Toropov, A. A., Achary, P. G. R., and Toropova, A. P. (2016) Quasi-SMILES and Nano-QFPR: The Predictive Model for Zeta Potentials of Metal Oxide Nanoparticles. Chem. Phys. Lett. 660, 107− 110. (24) Bragazzi, N., Toropov, A., Toropova, A., Pechkova, E., and Nicolini, C. (2016) Quasi-QSPR to Predict Proteins Behavior Under Various Concentrations of Drug Using Nanoconductometric Assay. NanoWorld J. 2, 1. (25) Toropova, A. P., Toropov, A. A., Veselinović, J. B., and Veselinović, A. M. (2015) QSAR as a Random Event: A Case of NOAEL. Environ. Sci. Pollut. Res. 22, 8264−8271. (26) Achary, P. G. R., Begum, S., Toropova, A. P., and Toropov, A. A. (2016) A Quasi-SMILES Based QSPR Approach towards the Prediction of Adsorption Energy of Ziegler ??? Natta Catalysts for Propylene Polymerization. Mater. Discovery 5, 22−28. (27) Manganelli, S., Leone, C., Toropov, A. A., Toropova, A. P., and Benfenati, E. (2016) QSAR Model for Cytotoxicity of Silica Nanoparticles on Human Embryonic Kidney Cells. Mater. Today Proc. 3, 847−854. (28) Toropova, A. P., Toropov, A. A., Leszczynska, D., and Leszczynski, J. (2017) CORAL and Nano-QFAR: Quantitative Feature − Activity Relationships (QFAR) for Bioavailability of Nanoparticles (ZnO, CuO, Co3O4, and TiO2). Ecotoxicol. Environ. Saf. 139, 404− 407. (29) Toropov, A. A., Toropova, A. P., Begum, S., and Achary, P. G. R. (2016) Towards Predicting the Solubility of CO 2 and N 2 in Different

T.X.T, J.K, and T.H.Y wrote the manuscript with input from all authors. Funding

This work was supported by the Industrial Strategic Technology Development Program (10043929, Development of “User-friendly Nanosafety Prediction System”) funded by the Ministry of Trade, Industry & Energy (MOTIE) of Korea. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors appreciate NCEC members (Hanyang University, Seoul), staff of KIT (Korea Institute of Toxicology, Daejeon; Ms. Soojin Kim, Dr. Junghwa Oh, and Dr. Seokjoo Yoon), and other collaborators in the S2NANO community (Safe and Sustainable Nanotechnology, www.s2nano.org) for their contributions to the nanosafety data collection.



ABBREVIATIONS QSAR, quantitative structure−activity relationship; SMILES, simplified molecular-input line-entry system; MWCNT, multiwalled carbon nanotubes; REACH, Registration, Evaluation, Authorisation and Restriction of Chemicals; S2NANO, Safe and Sustainable Nano



REFERENCES

(1) European Commission. Regulations (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 Concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), Establishing a European Chemical Agency, Amending Directive 1999/4. Off. J. Eur. Communities, 2006. (2) Oh, E., Liu, R., Nel, A., Gemill, K. B., Bilal, M., Cohen, Y., and Medintz, I. L. (2016) Meta-Analysis of Cellular Toxicity for CadmiumContaining Quantum Dots. Nat. Nanotechnol. 11, 479. (3) Liu, R., Zhang, H. Y., Ji, Z. X., Rallo, R., Xia, T., Chang, C. H., Nel, A., and Cohen, Y. (2013) Development of Structure-Activity Relationship for Metal Oxide Nanoparticles. Nanoscale 5, 5644−5653. (4) Mu, Y., Wu, F., Zhao, Q., Ji, R., Qie, Y., Zhou, Y., Hu, Y., Pang, C., Hristozov, D., Giesy, J. P., and Xing, B. (2016) Predicting Toxic Potencies of Metal Oxide Nanoparticles by Means of Nano-QSARs. Nanotoxicology 10, 1207−1214. (5) Ghorbanzadeh, M., Fatemi, M. H., and Karimpour, M. (2012) Modeling the Cellular Uptake of Magnetofluorescent Nanoparticles in Pancreatic Cancer Cells: A Quantitative Structure Activity Relationship Study. Ind. Eng. Chem. Res. 51, 10712−10718. (6) Winkler, D. a, Burden, F. R., Yan, B., Weissleder, R., Tassa, C., Shaw, S., and Epa, V. C. (2014) Modelling and Predicting the Biological Effects of Nanomaterials. SAR QSAR Environ. Res. 25, 161− 172. (7) Puzyn, T., Rasulev, B., Gajewicz, A., Hu, X., Dasari, T. P., Michalkova, A., Hwang, H.-M., Toropov, A., Leszczynska, D., and Leszczynski, J. (2011) Using Nano-QSAR to Predict the Cytotoxicity of Metal Oxide Nanoparticles. Nat. Nanotechnol. 6, 175−178. (8) Liu, R., Rallo, R., George, S., Ji, Z., Nair, S., Nel, A. E., and Cohen, Y. (2011) Classification NanoSAR Development for Cytotoxicity of Metal Oxide Nanoparticles. Small 7, 1118−1126. (9) Pan, Y., Li, T., Cheng, J., Telesca, D., Zink, J. I., and Jiang, J. (2016) Nano-QSAR Modeling for Predicting the Cytotoxicity of Metal Oxide Nanoparticles Using Novel Descriptors. RSC Adv. 6, 25766−25775. (10) Chen, G., Peijnenburg, W. J. G. M., Kovalishyn, V., and Vijver, M. G. (2016) Development of Nanostructure-Activity Relationships Assisting the Nanomaterial Hazard Categorization for Risk Assessment and Regulatory Decision-Making. RSC Adv. 6, 52227−52235. (11) Gernand, J. M., and Casman, E. A. (2014) A Meta-Analysis of Carbon Nanotube Pulmonary Toxicity Studies-How Physical G

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

Article

Chemical Research in Toxicology Polymers Using a Quasi-SMILES Based QSPR Approach. SAR QSAR Environ. Res. 27, 293−301. (30) Toropova, A. P., Toropov, A. A., Veselinovic, A. M., Veselinovic, J. B., Benfenati, E., Leszczynska, D., and Leszczynski, J. (2016) NanoQSAR: Model of Mutagenicity of Fullerene as a Mathematical Function of Different Conditions. Ecotoxicol. Environ. Saf. 124, 32−36. (31) Toropova, A. P., and Toropov, A. A. (2017) Nano-QSAR in Cell Biology: Model of Cell Viability as a Mathematical Function of Available Eclectic Data. J. Theor. Biol. 416, 113−118. (32) Toropova, A. P., Toropov, A. A., Rallo, R., Leszczynska, D., and Leszczynski, J. (2015) Optimal Descriptor as a Translator of Eclectic Data into Prediction of Cytotoxicity for Metal Oxide Nanoparticles under Different Conditions. Ecotoxicol. Environ. Saf. 112, 39−45. (33) Toropova, A. P., Toropov, A. A., Manganelli, S., Leone, C., Baderna, D., Benfenati, E., and Fanelli, R. (2016) Quasi-SMILES as a Tool to Utilize Eclectic Data for Predicting the Behavior of Nanomaterials. NanoImpact 1, 60−64. (34) Toropova, A. P., Toropov, A. A., Benfenati, E., Korenstein, R., Leszczynska, D., and Leszczynski, J. (2015) Optimal Nano-Descriptors as Translators of Eclectic Data into Prediction of the Cell Membrane Damage by Means of Nano Metal-Oxides. Environ. Sci. Pollut. Res. 22, 745−757. (35) Toropova, A. P., Toropov, A. A., Rallo, R., Leszczynska, D., and Leszczynski, J. (2015) Optimal Descriptor as a Translator of Eclectic Data into Prediction of Cytotoxicity for Metal Oxide Nanoparticles under Different Conditions. Ecotoxicol. Environ. Saf. 112, 39−45. (36) Toropova, A. P., Toropov, A. A., Benfenati, E., Puzyn, T., Leszczynska, D., and Leszczynski, J. (2014) Optimal Descriptor as a Translator of Eclectic Information into the Prediction of Membrane Damage: The Case of a Group of ZnO and TiO2 Nanoparticles. Ecotoxicol. Environ. Saf. 108, 203−209. (37) Toropova, A. P., and Toropov, A. A. (2013) Optimal Descriptor as a Translator of Eclectic Information into the Prediction of Membrane Damage by Means of Various TiO2 Nanoparticles. Chemosphere 93, 2650−2655. (38) Weininger, D. (1988) SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Model. 28, 31−36. (39) Toropov, A. A., Toropova, A. P., Puzyn, T., Benfenati, E., Gini, G., Leszczynska, D., and Leszczynski, J. (2013) QSAR as a Random Event: Modeling of Nanoparticles Uptake in PaCa2 Cancer Cells. Chemosphere 92, 31−37. (40) Nymark, P., Alstrup, J., Suhonen, S., Kembouche, Y., Vippola, M., Kleinjans, J., Catalán, J., Norppa, H., Van Delft, J., and Briedé, J. J. (2014) Free Radical Scavenging and Formation by Multi-Walled Carbon Nanotubes in Cell Free Conditions and in Human Bronchial Epithelial Cells. Part. Fibre Toxicol. 11, 1−18. (41) Ye, S. F., Wu, Y. H., Hou, Z. Q., and Zhang, Q. Q. (2009) ROS and NF-??B Are Involved in Upregulation of IL-8 in A549 Cells Exposed to Multi-Walled Carbon Nanotubes. Biochem. Biophys. Res. Commun. 379, 643−648. (42) Kim, J. S., Song, K. S., Joo, H. J., Lee, J. H., and Yu, I. J. (2010) Determination of Cytotoxicity Attributed to Multiwall Carbon Nanotubes (MWCNT) in Normal Human Embryonic Lung Cell (WI-38) Line. J. Toxicol. Environ. Health, Part A 73, 1521−1529. (43) Haniu, H., Saito, N., Matsuda, Y., Tsukahara, T., Maruyama, K., Usui, Y., Aoki, K., Takanashi, S., Kobayashi, S., Nomura, H., et al. (2013) Culture Medium Type Affects Endocytosis of Multi-Walled Carbon Nanotubes in BEAS-2B Cells and Subsequent Biological Response. Toxicol. In Vitro 27, 1679−1685. (44) Kim, J.-E., Kang, S.-H., Moon, Y., Chae, J.-J., Lee, A. Y., Lee, J.H., Yu, K.-N., Jeong, D. H., Choi, M., and Cho, M.-H. (2014) Physicochemical Determinants of Multiwalled Carbon Nanotubes on Cellular Toxicity: In Fl Uence of a Synthetic Method and PostTreatment. Chem. Res. Toxicol. 27, 290−303. (45) Hussain, S., Sangtian, S., Anderson, S. M., Snyder, R. J., Marshburn, J. D., Rice, A. B., Bonner, J. C., and Garantziotis, S. (2014) Inflammasome Activation in Airway Epithelial Cells after Multi-Walled

Carbon Nanotube Exposure Mediates a Profibrotic Response in Lung Fibroblasts. Part. Fibre Toxicol. 11, 28. (46) Ursini, C. L., Cavallo, D., Fresegna, A. M., Ciervo, A., Maiello, R., Buresti, G., Casciardi, S., Bellucci, S., and Iavicoli, S. (2014) Differences in Cytotoxic, Genotoxic, and Inflammatory Response of Bronchial and Alveolar Human Lung Epithelial Cells to Pristine and COOH-Functionalized Multiwalled Carbon Nanotubes. BioMed Res. Int. 2014, 1−14. (47) Chatterjee, N., Yang, J., Kim, H. M., Jo, E., Kim, P. J., Choi, K., and Choi, J. (2014) Potential Toxicity of Differential Functionalized Multiwalled Carbon Nanotubes (MWCNT) in Human Cell Line (BEAS2B) and Caenorhabditis Elegans. J. Toxicol. Environ. Health, Part A 77, 1399−1408. (48) Maruyama, K., Haniu, H., Saito, N., Matsuda, Y., Tsukahara, T., Kobayashi, S., Tanaka, M., Aoki, K., Takanashi, S., Okamoto, M., and Kato, H. (2015) Endocytosis of Multiwalled Carbon Nanotubes in Bronchial Epithelial and Mesothelial Cells. BioMed Res. Int. 2015, 1. (49) Ursini, C. L., Maiello, R., Ciervo, A., Fresegna, A. M., Buresti, G., Superti, F., Marchetti, M., Iavicoli, S., and Cavallo, D. (2016) Evaluation of Uptake, Cytotoxicity and Inflammatory Effects in Respiratory Cells Exposed to Pristine and -OH and -COOH Functionalized Multi-Wall Carbon Nanotubes. J. Appl. Toxicol. 36, 394−403. (50) Toropov, A. A., Toropova, A. P., and Benfenati, E. (2009) QSAR Modelling for Mutagenic Potency of Heteroaromatic Amines by Optimal SMILES-Based Descriptors. Chem. Biol. Drug Des. 73, 301−312. (51) Johnson, S. C. (1967) Hierarchical Clustering Schemes. Psychometrika 32, 241−254. (52) Netzeva, T. I., Worth, A. P., Aldenberg, T., Benigni, R., Cronin, M. T. D., Gramatica, P., Jaworska, J. S., Kahn, S., Klopman, G., Marchant, C. A., et al. (2005) Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships. ATLA Alternatives to Laboratory Animals 33, 155−173. (53) Veselinović, A. M., Milosavljević, J. B., Toropov, A. A., and Nikolić, G. M. (2013) SMILES-Based QSAR Model for Arylpiperazines as High-Affinity 5-HT1A Receptor Ligands Using CORAL. Eur. J. Pharm. Sci. 48, 532−541. (54) Veselinović, A. M., Milosavljević, J. B., Toropov, A. A., and Nikolić, G. M. (2013) SMILES-Based QSAR Models for the Calcium Channel-Antagonistic Effect of 1,4-Dihydropyridines. Arch. Pharm. (Weinheim, Ger.) 346, 134−139. (55) Veselinović, J. B., Toropov, A. A., Toropova, A. P., Nikolić, G. M., and Veselinović, A. M. (2015) Monte Carlo Method-Based QSAR Modeling of Penicillins Binding to Human Serum Proteins. Arch. Pharm. (Weinheim, Ger.) 348, 62−67. (56) Toropov, A. A., Toropova, A. P., Benfenati, E., and Manganaro, A. (2009) QSAR Modelling of Carcinogenicity by Balance of Correlations. Mol. Diversity 13, 367−373. (57) Toropov, A. A., Toropova, A. P., Como, F., and Benfenati, E. (2016) Quantitative Structure−activity Relationship Models for Bee Toxicity. Toxicol. Environ. Chem. 2248, 1. (58) Toropova, A. P., Toropov, A. A., Veselinovic, J. B., Veselinovic, A. M., Benfenati, E., Leszczynska, D., and Leszczynski, J. (2015) Application of the Monte Carlo Method to Prediction of Dispersibility of Graphene in Various Solvents. Int. J. Environ. Res. 9, 1211−1216. (59) Toropova, A. P., Toropov, A. A., Benfenati, E., Leszczynska, D., and Leszczynski, J. (2015) QSAR Model as a Random Event: A Case of Rat Toxicity. Bioorg. Med. Chem. 23, 1223−1230. (60) Toropov, A. A., Toropova, A. P., and Benfenati, E. (2010) SMILES-Based Optimal Descriptors: QSAR Modeling of Carcinogenicity by Balance of Correlations with Ideal Slopes. Eur. J. Med. Chem. 45, 3581−3587.

H

DOI: 10.1021/acs.chemrestox.7b00303 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX