1 Development of a Comprehensive Flavonoid Analysis


1 Development of a Comprehensive Flavonoid Analysis...

6 downloads 122 Views 2MB Size

Subscriber access provided by University of Newcastle, Australia

Article

Development of a Comprehensive Flavonoid Analysis Computational Tool for Ultra High-Performance Liquid Chromatography-Diode Array Detection-High Resolution Accurate Mass-Mass Spectrometry Data Mengliang Zhang, Jianghao Sun, and Pei Chen Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b00771 • Publication Date (Web): 25 Jun 2017 Downloaded from http://pubs.acs.org on June 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

Development of a Comprehensive Flavonoid Analysis Computational Tool for Ultra High-

2

Performance Liquid Chromatography-Diode Array Detection-High Resolution Accurate

3

Mass-Mass Spectrometry Data

4

Mengliang Zhang†, Jianghao Sun†, and Pei Chen∗

5

Food Composition and Methods Development Lab, Beltsville Human Nutrition Research Center,

6

Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland

7

20705-2350, USA

8 9 10 11 12 13



Contributed equally to this manuscript.



Corresponding author: Tel.: +1 301 504 8144; fax: +1 301 504 8314. E-mail address:[email protected]

1 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

14 15

Page 2 of 32

ABSTRACT Liquid chromatography and mass spectrometry methods, especially ultra-high performance

16

liquid chromatography coupled with diode array detection and high resolution accurate-mass

17

multi-stage mass spectrometry (UHPLC-DAD-HRAM/MSn), have become the tool-of-the-trade

18

for profiling flavonoids in foods. However, manually processing acquired UHPLC-DAD-

19

HRAM/MSn data for flavonoid analysis is very challenging and highly expertise-dependent due

20

to the complexities of the chemical structures of the flavonoids and the food matrices. A

21

computational expert data analysis program, FlavonQ-2.0v, has been developed to facilitate this

22

process. The program firstly uses UV-Vis spectra for an initial step-wise classification of

23

flavonoids into classes and then identifies individual flavonoids in each class based on their mass

24

spectra. Step-wise identification of flavonoid classes is based on a UV-Vis spectral library

25

compiled from 146 flavonoid reference standards and a novel chemometric model that uses step-

26

wise strategy and projected distance resolution (PDR) method. Further identification of the

27

flavonoids in each class is based on an in-house database that contains 5686 flavonoids analyzed

28

in-house or previously reported in the literature. Quantitation is based on the UV-Vis spectra.

29

The step-wise classification strategy to identify classes significantly improved the performance

30

of the program and resulted in more accurate and reliable classification results. The program was

31

validated by analyzing data from a variety of samples, including mixed flavonoid standards,

32

blueberry, mizuna, purple mustard, red cabbage, and red mustard green. Accuracies of

33

identification for all samples were above 88%. FlavonQ-2.0v greatly facilitates the identification

34

and quantitation of flavonoids from UHPLC-HRAM-MSn data. It saves time and resources and

35

allows less experienced people to analyze the data.

36

2 ACS Paragon Plus Environment

Page 3 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

37 38

Analytical Chemistry

INTRODUCTION. Flavonoids are a group of phenolic compounds with various bioactivities and are widely

39

distributed in plants. In various in vitro and in vivo models, they have exhibited diverse

40

biological activities including anti-inflammatory, anti-atherosclerotic, antitumor, anti-

41

thrombogenic, anti-osteoporotic, and anti-viral effects.1 Although dietary flavonoids may play an

42

important role in human health, making recommendation on daily flavonoid intakes is very

43

difficult. One of the important issues that limit progress in dietary flavonoid recommendations

44

for consumers is the lack of appropriate analytical methods for the determination of flavonoids in

45

foods and dietary intake levels.2

46

Profiling flavonoids in foods is challenging due to the fact that their structures are complex,

47

their distribution and concentrations in plants vary greatly, and commercially available reference

48

standards are limited.3 Liquid chromatography mass spectrometry (LC/MS) has become the most

49

commonly employed method in flavonoid identification and quantification.2,4 While technical

50

advances such as ultra-high-performance liquid chromatography-diode array detection-high

51

resolution accurate-mass multi-stage mass spectrometry (UHPLC-DAD-HRAM-MSn) can

52

provide much more detailed information for a sample, it also brings us a new challenge: the

53

tremendous amounts of data to be analyzed. In recent years, the emergence of a few “omics”

54

tools such as XCMS,5 MZmine,6,7 MetSign,8 and MET-COFEA9 have greatly facilitated data

55

analysis using automated peak picking, peak alignment, peak integration and database searching.

56

However, they are designed for non-targeted metabolomics or metabolite profiling. They are

57

inadequate for the analysis of a specific class of targeted plant secondary metabolites, such as

58

flavonoids, due to the lack of specificity. Herein, FlavonQ-2.0v, a software program specifically

59

designed for the analysis of flavonoids, has been developed.

3 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

60

Page 4 of 32

FlavonQ-2.0v has made several important advances compared with its predecessor, FlavonQ.

61

Like FlavonQ,10 FlavonQ-2.0v features all the functions necessary to detect chromatographic

62

peaks, integrate peak areas, interpret MS spectra, and produce qualitative and quantitative results.

63

The important advance of FlavonQ-2.0v are: 1) it is capable of analysis of all the major classes

64

of flavonoids, including flavone/flavonol, flavan/flavanol, flavanone/flavanonol, isoflavone,

65

anthocyanidins, and hydroxycinnamic acids (non-flavonoids). (Figure 1); 2) the program uses a

66

chemometric pattern recognition method to classify the classes of the flavonoids by comparing

67

the UV spectrum of a chromatographic peak to an UV-Vis spectra library of 146 flavonoid and

68

hydroxycinnamic acid standards; 3) the result obtained from the above-mentioned step is

69

correlated with HRAM/MSn spectra of that peak and searched against an in-house flavonoid

70

database for tentative identification.

71

In this study, the step-wise approach of FlavonQ-v2.0 is explained and illustrated. The

72

advantages of step-wise strategy with the projected difference resolution (PDR) method over

73

conventional classification strategy is demonstrated. The program is validated with the analysis

74

of samples spiked with flavonoids, mix standards, and plant extracts. The improved approach

75

used in FlavonQ-2.0v is innovative, efficient, and highly effective.

76

MATERIALS AND METHODS

77

Chemicals and Plant Materials. Formic acid, HPLC grade methanol and acetonitrile were

78

purchased from Fisher Scientific. (Pittsburgh, PA). HPLC grade water was prepared from

79

distilled water using a Milli-Q system (Millipore Laboratory, Bedford, MA). The reference

80

standards for flavonoids and hydroxycinnamic acid derivatives were obtained from Sigma-

81

Aldrich (St. Louis, MO), Chromadex, Inc. (Irvine, CA), Indofine Chemical Co. (Somerville, NJ),

4 ACS Paragon Plus Environment

Page 5 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

82

and Extrasynthese (Genay, Cedex, France). A list of 146 reference standards can be found in

83

Supporting Information.

84

Blueberry (Vaccinium corymbosum L.), mizuna (Brassica juncea), purple mustard

85

(Chorispora tenella), red cabbage (Brassica oleracea L.), and red mustard green (Brassica

86

juncea) were purchased from local grocery stores, and lyophilized immediately upon arrival and

87

then ground and powdered.

88

UHPLC-DAD-MS Instrument. The UHPLC coupled with a diode array detector and LTQ

89

Orbitrap XL mass spectrometer (Thermo Fisher Scientific, San Jose, CA) was used. The

90

chromatographic separation was achieved using a UHPLC column (200 mm × 2.1 mm i.d., 1.9

91

µm, Hypersil Gold AQ RP-C18) (Thermo Fisher Scientific, Inc., Waltham, MA) with an

92

HPLC/UHPLC pre-column filter (UltraShield Analytical Scientific Instruments, Richmond, CA)

93

at a flow rate of 0.3 mL/min. UHPLC gradient and MS parameter settings were adapted from a

94

previous study 10 and the details can be found in the Supporting Information.

95

Sample Preparation. Each powdered sample (250 mg) was extracted with 5.00 mL of

96

methanol/water (60:40, v/v) using sonication for 60 min at room temperature and the slurry

97

mixture was centrifuged at 5,000 g for 15 min (IEC Clinical Centrifuge, Damon/IEC Division,

98

Needham, MA). The supernatant was filtered through a 17 mm (0.45 µm) PVDF syringe filter

99

(VWR Scientific, Seattle, WA), and 2 µL of the extract was used for each injection.

100

Data Format. MATLAB R2012b (MathWorks Inc., Natick, MA) was used to develop the

101

program. All the calculations were performed on an Intel Core i7-4770 CPU at 3.4 GHz personal

102

computer with 16 GB RAM running a Microsoft Windows 7 Professional x64 operation system

103

(Microsoft Corp., Redmond, WA).

5 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

104

The UHPLC-DAD HRAM MS data sets were acquired as RAW files. The DAD data were

105

converted to text files from RAW files by Xcalibur plug-in tool, MSGet.11 With an in-house

106

algorithm, text files were read into MATLAB. For the MS data, the RAW files were first

107

converted to mzXML by an open-source software package, ProteoWizard,12 and then read into

108

MATLAB by the built-in ‘mzxmlread’ function in MATLAB bioinformatics toolbox.

109

RESULTS AND DISCUSSION

110

Page 6 of 32

UV-Vis Spectral Library of 146 Flavonoid Standards. First, 146 flavonoid and

111

hydroxycinnamic acid derivative standards were analyzed using the UHPLC-DAD method and

112

their UV-Vis spectra were compiled into a UV-Vis spectral library after they were normalized to

113

unit vector length.13 The 146 UV-Vis spectra are shown in Figure 2A. As discussed in the

114

previous paper,10 flavonoid identification cannot be solely relied upon MS spectra, and often

115

requires the combination of multiple techniques such as chromatographic behavior, UV-Vis

116

spectrum, and HRAM-MS and MS fragmentation information. Flavonoids have characteristic

117

UV-Vis absorbance profiles which come from different conjugated systems in the structures and

118

can be used to distinguish isomers. For example, pelargonidin 3-O-glucoside (an anthocyanin),

119

genistein 4'-O-glucoside (an isoflavone), and apigenin 7-O-glucoside (a flavonol glycoside) have

120

exactly the same protonated or deprotonated ions in full scan MS spectra, and their

121

fragmentation mass spectra are dominated by one or only a few fragments simply do not contain

122

enough information to distinguish between them. The representative MS/MS spectra for the

123

three flavonoids mentioned above are shown in Figure S1. But they can be differentiated by their

124

UV-Vis spectra since the cinnamoyl structure in flavone, flavonol, and hydroxycinnamic acid

125

derivatives have a strong UV absorbance band between 305-390 nm, however anthocyanins are

126

cations with a strong visible absorbance band at 450-550 nm (Figure 2A).14,15

6 ACS Paragon Plus Environment

Page 7 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

127

The assignment of classes for flavonoids based on their UV-Vis spectrum is a crucial step,

128

especially for the identification of flavonoid isomers which belong to different flavonoid classes.

129

In our previous study, UV-Vis spectrum similarity analysis was used to assign the class of

130

flavonoids for each chromatographic peak.10 The UV-Vis spectrum of a reference peak, either a

131

spiked standard or an endogenous flavonoid peak, was selected and compared with that of all

132

other chromatographic peaks. A threshold was set based on a trial-and-error procedure to filter

133

out non-desired peaks. Particular care needed to be taken for that method: 1). the reference peak

134

had to be representative of the class of flavonoid as selection of a reference peak sometimes can

135

be difficult especially for the classes of flavonoids which contain a great variety of substitution

136

groups; 2). plant samples usually contain different classes of flavonoids, therefore multiple

137

reference peaks need to be selected to represent the different classes of flavonoids and multiple

138

calculations are required since only one class of flavonoids could be classified by each

139

calculation; and 3). the threshold for UV-Vis spectral similarity analysis varies case by case. For

140

example, the thresholds ranged from 50% to 90% for leek, curry leaf, chive, giant green onion,

141

and red mustard green samples.10 Thus, although the similarity analysis of FlavonQ worked well

142

for the class of flavonols and their glycosides,10 it is inconvenient to use the approach for

143

identification of multiple classes.

144

Grouping 146 Reference Standards into Four Classes. A new strategy was developed in

145

FlavonQ-2.0v to improve the similarity approach by using chemometric modeling and a UV-Vis

146

spectral library. The UV-Vis spectral library was compiled from the UV-Vis data of 134

147

flavonoids and 12 hydroxycinnamic acid derivatives (HADs) standards. Although HADs do not

148

belong to flavonoid family, they were also included because their structures are similar to

149

flavonoids and they are ubiquitous in plant with various bioactivities.16 The standards were

7 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

150

divided into four classes on the basis of the structural similarities of the aglycones (Figure 1):

151

flavone/flavonol/HAD (class A), flavan/flavanol/flavanone/flavanonol (class B), anthocyanin

152

(class C), and isoflavone (class D). Chemometric methods were employed to construct models

153

for classification of different flavonoid classes based on the UV-Vis spectral library, and the

154

classifiers were used to predict the class of the flavonoid in unknown chromatographic peaks.

Page 8 of 32

155

Options for Chemometric Models in FlavonQ-2.0v. Two methods, including soft

156

independent modeling of class analogy (SIMCA)17 and fuzzy optimal associative memory

157

(FOAM)18, were evaluated. Classification methods such as partial least-squares discriminant

158

analysis (PLS-DA)19 and the fuzzy rule-building expert system (FuRES)20 were not used because

159

they cannot be applied when only one class is known or present.21 FlavonQ-2.0v was designed

160

to be a versatile program which can classify not only single-class flavonoid, but also multiclass

161

flavonoids. Therefore the classifiers like PLS-DA was not adopted in this study. In this program,

162

it is the user’s decision as to which group(s) of flavonoids will be used to build classification

163

models. There are several advantages to making the flavonoid type selection adjustable.

164

Flavonoids are usually synthesized through the phenylpropanoid metabolic pathway and several

165

enzymes are involved in the biosynthesis. It is rare that a single plant sample contains all the

166

enzymes for synthesis of all the classes of flavonoids. Limiting the flavonoid types in the sample

167

can reduce the complexity of chemometric models and improve the model accuracy and

168

reliability. For example, purple broccoli only contains flavonols and anthocyanins.22 So if the

169

chemometric model is built using only these two classes for the analysis of flavonoids in broccoli,

170

it will simplify the data analysis and reduce the possibility of misclassifying them into other

171

classes of flavonoids. Moreover, in some cases, only one class of flavonoids is the research focus

8 ACS Paragon Plus Environment

Page 9 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

172

(e.g., the isoflavones in soybean samples) and a chemometric model targeting the class of

173

interest can be very efficient.

174

SIMICA and FOAM are commonly used as modeling methods, but they can be used in

175

classification mode as well. Modeling methods exploit the similarities of the features within each

176

independent class, therefore the test sample could belongs to none of the existing classes in the

177

training sets. However, in classification mode, the test sample must be assigned to one of the

178

classes in the training sets. When classifying an unknown UV-Vis spectrum by the constructed

179

SIMCA/FOAM models with more than one flavonoid class, three situations could be

180

encountered: (a) it only belongs to one class; (b) it belongs to none of any classes; (c) it belongs

181

to more than one class. The UV-Vis spectra from real sample could be different from the UV-Vis

182

spectra in the library attributed to influence of environment (e.g., temperature, solvent) and

183

possible coeluted compounds. In addition, the accuracy of classification is also highly relied on

184

the quality of the training set: the number and representativeness of flavonoids in the library (The

185

in-house UV-Vis library may not be able to represent all flavonoids in the tested plant materials).

186

If modeling mode was used, some flavonoids that were not included in the library or their UV-

187

Vis spectra were distorted by other background influences could be misclassified as non-

188

flavonoids which resulted in false negatives. Therefore, the winner-takes-all mode (classification

189

mode) was used in both SIMCA and FOAM models to avoid situation (b). Statistic values (i.e.,

190

the combination of X-residuals and Hotelling’s T2 value for SIMCA and F-value for FOAM

191

model) were calculated between the variance of an unknown UV-Vis spectrum in

192

chromatographic peak and each flavonoid class, and the unknown UV-Vis spectrum was

193

assigned to the best fit class of flavonoids (in another word, the most ‘similar’ class of flavonoids

194

with smaller X-residuals and Hotelling’s T2 value for SIMCA model or F-value for FOAM

9 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

195

model). Although the winner-takes-all mode may result false positives, the result will be refined

196

by using MS spectra. Similarly, for situation (c), only the most ‘similar’ class instead of multiple

197

classes was assigned to an unknown UV-Vis spectrum. When only one class of flavonoids was selected to construct chemometric models, the

198 199

statistic criteria (X-residuals and Hotelling’s T2 with 95% confidence intervals for SIMCA and F

200

0.05

for FOAM model) was used to define the limit of the class and reject non-flavonoids.

201

Projected Difference Resolution Method to Optimize Wavelength Range of UV-Vis

202

Data. UV-Vis spectra contain characteristic regions and non-informative regions. Chemometric

203

models built directly using the UV-Vis spectral data over the full scan range (200-600 nm) were

204

not effective as shown in Figure 2B. Overlaps of the four classes were observed. It can be

205

advantageous to identify and remove the non-informative regions because it improves the

206

predictive ability and reduces complexity for chemometric models.23 For example, dropping off

207

the wavelength range between 200-220 nm in UV region is a common practice to avoid the

208

interferences caused by mobile phase and retain the most obvious features for flavonoids

209

between 220-600 nm.24

210

Selection of the wavelength range used in chemometric models can affect the classification

211

and is a challenging task because the spectra may have imperceptible distinctive features.

212

Therefore, the wavelength range of UV-Vis spectrum needs to be optimized in this study. One

213

straightforward way to achieve this is to build chemometric models for different wavelength

214

ranges, evaluate the models by cross-validation methods such as leave-one-sample-out

215

method25,26 and bootstrapped Latin partition method,21 calculate the classification rates for the

216

different wavelength ranges, and select the optimum range which gives the best classification

217

rate. However this calculation required hours to execute depending on which chemometric 10 ACS Paragon Plus Environment

Page 11 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

218

models and validation method were chosen and is not practical to use in the data processing

219

program.

220

In this study, the projected difference resolution (PDR) method27 was applied as an

221

alternative to determine the optimum wavelength range. The PDR method measures the

222

separation of two classes in multivariate data space and has been used successfully for selecting

223

the optimal parameters for baseline correction, wavelet filters, and data transformation.13,27,28

224

The larger the PDR values, the better the separation between two classes in the multivariate data

225

space. For the assessment of multiple classes, the minimum PDR value of all the pairwise

226

combinations was used to optimize the wavelength range.13 The two most similar classes among

227

multiple classes were considered as the most critical pairs for classification, so their PDR values

228

were calculated under different wavelength ranges. For example, when we have four classes, for

229

a specific wavelength range their PDR values in pairs (6 pairs) were measured, and the minimum

230

PDR value of 6 pairs was used to indicate the separation of the two most similar classes among

231

the four classes. Since the UV range is easily influenced by conjugated bonds and the higher

232

range of UV-vis spectra (wavelength ≥ 250 nm) usually represents characteristic information for

233

the structure of each flavonoid class, only the starting wavelengths (WLs) of UV-Vis spectra was

234

optimized in our study. Therefore a series of test UV-Vis spectral data sets were constructed

235

with different starting WLs: Test-set-1 (200-600 nm), Test-Set-2 (201-600 nm), Test-Set-3 (202-

236

600 nm) … Test-Set-301 (500-600 nm). For each wavelength range, the PDR values were

237

calculated for the different classes of flavonoids. The wavelength range with the maximum PDR

238

value represented the optimum wavelength range for the classification of the flavonoid classes.

239

Compared to the optimization of wavelength range with chemometric models which required

240

hours to execute, the PDR method only took seconds which saved considerable time.

11 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

241

Page 12 of 32

Classification of Flavonoids by Step-wise Strategy. Step-wise classification was devised in

242

this study to classify the UV-Vis spectra of flavonoids in a novel way and the starting WLs of

243

each step was optimized respectively. A flowchart for the two strategies of classification of 4

244

classes of flavonoids and the HADs is shown in Figure 3. Conventional classification strategy

245

optimized universal parameters in data preprocessing and constructed one chemometric model

246

by using all the data for the different classes. Step-wise classification strategy optimized data

247

representation for each pair of classes and constructed multiple chemometric models. In each

248

step, only two classes were defined and one group of flavonoids (class 1) was differentiated from

249

other flavonoids (class 2). It is worth noting that either SIMCA or FOAM model can be selected

250

for the classification, and the same model is used throughout the steps in step-wise classification.

251

It is shown in Figure 4 the dendrograms based on Euclidean distances between spectra for

252

two strategies outlined in Figure 3. Figure 4A shows that for a conventional classification

253

strategy, even after the starting WL was optimized, the four classes were mixed with each other

254

and none of classes was completely separated from others. However, the three dendrograms for a

255

step-wise classification strategy (Figure 4B) demonstrated classification of each group of

256

flavonoids into well-defined clusters. The benefit of step-wise classification strategy was proven

257

by classification rates of SIMCA/FOAM models through leave-one sample out cross validation.

258

With the conventional strategy, the best classification rates for the SIMCA and FOAM models

259

were 99.3% and 95.6% respectively. The classification rates were 100% for both the SIMCA and

260

FOAM models using the step-wise classification strategy.

261

The order of flavonoid classes in step-wise classification process has great impact on the

262

classification. For the four classes of flavonoids in Figure 2, twelve sequences were evaluated

263

and their PDR values in each step were calculated based on the method in ‘Projected Difference 12 ACS Paragon Plus Environment

Page 13 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

264

Resolution Method to Optimize Wavelength Range of UV-Vis Data’ section. The results shown

265

in Table S1 demonstrate that the flavonoid classification sequence used in Figure 3 and 4 is the

266

optimal order in step-wise classification process: a relatively larger PDR value was achieved for

267

the two most similar classes which indicates the better separation of the two classes in

268

multivariate data space. Therefore, FlavonQ-2.0v separates anthocyanidins from the rest classes

269

in the first step, then flavan/flavanone, and finally flavone/HAD and isoflavone in the step-wise

270

classification process.

271

The application of the step-wise strategy eliminates some misclassifications of flavonoids in

272

real samples. For example, peak #12 in blueberry sample (Figure S2) was manually identified as

273

petunidin-3-O-arabinoside by the study of its mass spectra in both the positive and negative

274

ionization modes.29 When conventional classification was used, it was misclassified as

275

flavone/HAD group (Figure 5A). The absorption band at 525 nm indicates that it is an

276

anthocyanin instead of a flavone (Figure 5C). Peak #12 was successfully classified as

277

anthocyanin when the step-wise classification strategy was applied (Figure 5B). Higher weight

278

was given to the characteristic UV-Vis band (525 nm) for anthocyanin by this strategy (Figure

279

5D). The step-wise strategy was more effective for classifying flavonoids based on their UV-Vis

280

spectra and, therefore, was adopted in this program. It is worth noting that isoflavones were not

281

included in the chemometric model to study the flavonoids in this example because isoflavones

282

are usually not found in blueberry.

283

Identification of Flavonoids Using In-house Database. After the chromatographic peaks

284

were categorized into different classes of flavonoids, HRAM/MSn data were used for putative

285

identification of flavonoids and HAD. An in-house database was established in our lab which

286

contained 5686 flavonoids and related compounds categorized into the four classes. The 13 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 32

287

information for each compound such as chemical name, formula, accurate monoisotopic mass,

288

protonated mass, deprotonated mass and major product ions were included. Major product ions

289

assigned to 4283 compounds were obtained by the observation of fragmentation mass spectra of

290

flavonoids in our lab, METLIN mass spectrum database,30 and the mass spectral library from

291

Sumner’s group31 or by predictions based on experience of experts and in silico fragmentation

292

patterns using commercial software package HighChem Mass Frontier (Thermo Fisher Scientific

293

Inc., San Jose, CA).

294

A selected number (user defined) of the most intense ions from the MS full scan spectrum of

295

unknown chromatographic peaks were screened and matched with ions in the positive or

296

negative mode from the in-house database. If the MSn spectra for the ions in full scan spectrum

297

were available in the data, the major product ions were searched through the MSn spectra for

298

matches. Multiple hits could be found after this searching process and all these candidate

299

compounds were ranked in the result table based on the following priorities: candidate

300

compounds with both precursor ions and product ions matched were ranked higher than others

301

which were then ranked by mass errors in ascending order.

302

The program may provide multiple flavonoid candidates for a chromatographic peak, and

303

expertise in the field of flavonoid research is needed for affirmative identification (see examples

304

in Table S2-3). In the previous version of FlavonQ, the identification of flavonoids was based on

305

a virtual mass spectrum database which was constructed by theoretically combining common

306

aglycones and substitution groups.10 For a single class of flavone/flavonol glycosides, it

307

contained over 1.5 million possible combinations which, in most cases, have never occurred in

308

the real world. In this study all 5686 flavonoids and related compounds in the in-house database

309

have been reported before. With this database, the computation speed of the program was faster 14 ACS Paragon Plus Environment

Page 15 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

310

and the identification results were more accurate. Flavonoids not included in this database could

311

not be identified. However, potential flavonoid peaks (based on UV data) are flagged, and can be

312

manually identified and added to the database if needed.

313

Comparison with METLIN and MegFrag using Flavonoid UHPLC-DAD-MS Dataset.

314

Two sets of mix reference standards (25 flavonoids and hydroxycinnamic acids) were analyzed

315

by UHPLC-DAD-MS method as described previously. Their precursor ions and MS/MS spectra

316

were manually input into METLIN database (http://metlin.scripps.edu) and MegFrag Web tool

317

(https://msbi.ipb-halle.de/MetFragBeta). For METLIN, the precursor ions were searched under

318

‘Simple Search’ function with 20 ppm tolerance; fragment search were performed under

319

‘Fragment Search’ function with ‘Precursor M/Z’ selected and up to 3 fragment ions were input

320

for each search. KEGG database was selected in MegFrag Web tool and 20 ppm tolerance was

321

set for ‘Parent Ion’ search and ‘Fragmentation Processing’. Besides of the proposed FlavonQ-

322

2.0v data process pipeline, the dataset were also processed in FlavonQ-2.0v by only searching

323

precursor ions and characteristic product ion in MS spectra through in-house database without

324

flavonoid classification based on UV-Vis spectra. The results, given in Table 1, show that

325

FlavonQ-2.0v generally performed better, indicated by higher number of correct first ranked

326

candidates, a lower number of ‘none of correct candidates available’, and a lower number of

327

output candidates for each search. Take puerarin (an isoflavone) as an example, by searching

328

[M-H]- (m/z 415.1029), METLIN outputs a list of 39 candidate compounds (puerarin ranked 28th)

329

and MetFrag outputs 7 candidate (puerarin ranked 5th). For the 39 candidates from METLIN, 11

330

of them are non-flavonoids, with 22 flavone glucosides, 1 anthocyanidin, and 5 isoflavones. If

331

‘Fragment search’ is applied, puerarin was not found because it has not been analyzed in

332

METLIN. The 7 candidates from MetFrag includes 1 flavone, 2 isoflavones and 4 non-

15 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 32

333

flavonoids, and MS/MS spectrum improved the ranking of puerarin from the 5th to the 3rd out of

334

7. For FlavonQ-2.0v, 19 candidates were found without the use of UV-Vis data (12 flavones, 5

335

isoflavones, and 2 anthocyanidins); only 5 candidates were found if UV-Vis spectra were used

336

for flavonoid classification and they were all isoflavones. The results were not unexpected due

337

to the specificity of the FlavonQ program.32 For example, METLIN includes 961,829 molecules

338

among which about 14,000 metabolites have been individually analyzed and another 200,000 has

339

in silico MS/MS data by May, 2017. For the ‘Fragment Search’ in METLIN, about half of the

340

queries (13 of 25) in Table 1 returned ‘0 candidate’ due to the lack of MS/MS data in the

341

database. From Table 1, it has been observed that flavonoid classification based on UV-Vis

342

spectra can effectively narrow down the list of candidate compounds because there are some

343

limitations for compound identification solely relied on MS/MS spectral comparison: for

344

example sometimes mass spectra are dominated by one or only a few fragments (e.g., a glycoside

345

group loss, Figure S1) that can be explained by several candidates. Further examples and

346

limitations of MS spectral library search are discussed extensively by Stephen Stein.33

347

Expansion of UV-Vis Library and In-house Database. As discussed in the previous

348

sections, the accuracy of flavonoid identification based on UV-Vis spectra and MS spectra can

349

be improved by expanding the UV-Vis library and in-house database. In our lab, the number of

350

flavonoid UV-Vis spectra continues to increase from several resources: acquisition of more

351

flavonoid reference standards, isolated chromatographic peaks from plant materials, and reported

352

spectra from peer-reviewed journals. The isolated chromatographic peaks should be pure and be

353

identified and confirmed by mass spectrometric (HRMS, MSn) and/or NMR methods34, and the

354

reported UV-Vis spectra should be validated by other independent labs. In this study, the UV-

16 ACS Paragon Plus Environment

Page 17 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

355

Vis spectra in the library were exclusively collected from flavonoid reference standards, and

356

more UV-Vis data from other sources will be updated in the future release.

357

The product ions information in the in-house database can effectively target the correct hit of

358

flavonoids especially when multiple isomers exist in the database for a particular precursor ion.

359

Ideally the mass spectra library should contain MSn spectra in both positive/negative modes and

360

different collision energies. Such a library will be able to provide the most accurate

361

identification of a compound such as Sumner’s plant natural product MS library31 and

362

Compound Discoverer from Thermo Fisher Scientific. However, to construct such a library for

363

over 5,000 flavonoids is not feasible for any single laboratory. Therefore we are enhancing our

364

in-house databases gradually by adding more characteristic product ions based on experiments

365

and literatures. As the UV-Vis library and in-house database expands, it will be effective

366

automatically because FlavonQ-2.0v recalculates its chemometric models and searching results

367

based on the updated library and database every time it executes.

368

Quantitation. The quantitation of flavonoids was performed using an external calibration

369

curve with flavonoid reference standards and molar response factors as previously reported.14,35

370

Ideally separate calibration curves should be used to quantify the flavonoids of each flavonoid

371

class. For example, quercetin 3-O-rutinoside (rutin) for HADs and flavone/flavonol glycosides,

372

catechin for flavan-3-ols and proanthocyanidins, hesperetin for flavanones, cyanidin 3-O-

373

glucoside for anthocyanins, and genistein for isoflavones. The peak area integration method was

374

demonstrated in the sample chromatogram (Figure S2) and different classes of flavonoids are

375

represented by different colors. A brief identification, including major ion and formula, is

376

provided for each flavonoid candidate chromatographic peak. The identification, peak areas, and

17 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

377

tentative quantitation information were output automatically into spreadsheet which allowed the

378

user to further analyze the results.

379

Performance of FlavonQ-2.0v. The performance of FlavonQ-2.0v was validated on samples

380

spiked with flavonoid mixed standards and samples of plant extracts. FlavonQ-2.0v successfully

381

identified all the flavonoid peaks in the flavonoid spiked mix standard samples. The results are

382

shown in Table S2 and S3. The results demonstrate the effectiveness of flavonoid identification

383

by UV-Vis and MS spectra. For example, apigenin was firstly classified by chemometric models

384

in FlavonQ-2.0v as flavone/flavonol/HAD based on its UV-Vis spectrum, so other isoflavone or

385

anthocyanin isomers were excluded after this step. It was then identified as ‘Apigenin’ in the

386

flavonoid candidate list based on its precursor ion (m/z 269.0450 with error -1.59 ppm) and

387

characteristic product ion (m/z 151, 1,3A-) and it was distinguished from Baicalein which has

388

characteristic product ion of m/z 169, 1,3A-). In some cases, multiple candidate flavonoids were

389

listed for a single peak since some flavonoid isomers with common product ions cannot be

390

differentiated by the program. For example, quercetin, morin, and hieracin (Figure 6) are all

391

flavonols and they have exact the same precursor ion (m/z 301.0348) and common characteristic

392

product ion (m/z 151, 1,3A-), so they were all reported in the result table.

393

FlavonQ-2.0v was also applied to the analysis of flavonoids in blueberry, mizuna, purple

394

mustard, red cabbage, and red mustard green. The data were also analyzed manually. The

395

FlavonQ-2.0v identification results were compared to those identified manually (Table S4).

396

Among the 39 flavonoid candidate peaks, two anthocyanins, petunidin-3-O-arabinoside and

397

petunidin-3-O-glucoside, were misidentified by FlavonQ-2.0v as flavonol glucosides and

398

flavanone glycoside, respectively. They were all small shoulder peaks (Peak #9 and #10 in

399

Figure S2) and their UV-Vis spectra were distorted by the close major peaks which led to the 18 ACS Paragon Plus Environment

Page 19 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

400

misclassification. This indicates that the chromatographic separation is critical for the correct

401

identification of flavonoids. Another two peaks were labeled as “uncertain peaks” as the spectral

402

data could not provide enough information for identification (Peak #21 and #24 in Figure S2 and

403

Table S4). The identification accuracy of flavonoids by FlavonQ-2.0v for plant materials is

404

shown in Table 2. Overall, positive identifications was achieved for more than 88% of the

405

flavonoid peaks using FlavonQ-2.0v.

406

The execution time of FlavonQ-2.0v was about 1 min for each sample after data format

407

conversion. Construction of chemometric models using all 146 UV-Vis spectra took about 30

408

seconds and the time was significantly reduced when fewer classes of flavonoids were selected

409

and fewer steps were conducted in the step-wise classification strategy. FlavonQ-2.0v was

410

developed in MATLAB 2012b, but it’s not necessary for the end user to install MATLAB to use

411

FlavonQ-2.0v. MATLAB Compiler Runtime (MCR) is required to run FlavonQ-2.0v standalone

412

application and is freely available at https://www.mathworks.com/products/compiler/mcr.html.

413

The graphic user interface is shown in Figure S3. The UV-Vis spectra of 146 flavonoid and

414

HAD reference standards and in-house flavonoid database were compiled into FlavonQ-2.0v.

415

This database will be continuously expanded and the chemometric models will become more

416

reliable. Other common food constituents, such as simple phenolic compounds, phenyl alcohols,

417

stilbenes, and lignans will also be included in the future. The in-house flavonoid database will be

418

updated regularly as new flavonoids are found and reported.

419

CONCLUSIONS

420

A data processing tool for flavonoid analysis, FlavonQ-2.0v, was developed in this study.

421

The program can classify the flavonoids using a chemometric model based on the UV-Vis

422

reference spectral library. The chemometric model used a novel step-wise classification strategy 19 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

423

and data representation in each step was optimized by projected distance resolution (PDR)

424

method. The step-wise classification strategy significantly improved the performance of the

425

classifiers which resulted in more accurate and reliable classification of flavonoids. An in-house

426

flavonoid database was implemented in the program for identification of flavonoids. FlavonQ-

427

2.0v was validated by analyzing data from samples spiked with flavonoid mixed standards and

428

blueberry, mizuna, purple mustard, red cabbage, and red mustard green extract samples.

429

Accuracies of identification for all samples were above 88%. FlavonQ-2.0v greatly facilitates the

430

identification and quantitation of flavonoids from UHPLC-HRAM-MS data. The automated

431

computational tool is developed to assist, rather than replace, human expert. The result shows

432

that it not only saves tremendous efforts for human experts, but also allows less-experienced

433

chemists to perform data analysis on flavonoids with reasonable results.

434

ASSOCIATED CONTENT

435

Supporting Information. Additional information as noted in text. This material is available

436

free of charge via the Internet at http://pubs.acs.org.

437

AUTHOR INFORMATION

438 439 440 441 442

Corresponding Author. ∗ E-mail address:[email protected]. Tel.: +1 301 504 8144; fax: +1 301 504 8314. Notes. The authors declare no competing financial interest. ACKNOWLEDGMENT This research is supported by the Agricultural Research Service of the U.S. Department of

443

Agriculture, an Interagency Agreement Number AOD12026-001-01004 with the Office of

444

Dietary Supplements at the National Institutes of Health. The John A. Milner Fellowship

445

program by USDA Beltsville Human Nutrition Research Center and the NIH Office of Dietary 20 ACS Paragon Plus Environment

Page 21 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

446

Supplements is acknowledged for the support to Dr. Mengliang Zhang. We thank Dr. Peter de B.

447

Harrington from Department of Chemistry and Biochemistry at Ohio University for providing

448

Matlab routines for PDR, PCA, SIMCA, and FOAM functions. We thank Dr. Joseph M. Betz

449

from NIH Office of Dietary Supplements and Dr. James M. Harnly from USDA for the careful

450

revision of this article.

451

References

452

(1) Nijveldt, R. J.; van Nood, E.; van Hoorn, D. E. C.; Boelens, P. G.; van Norren, K.; van

453

Leeuwen, P. A. M. Am. J. Clin. Nutr. 2001, 74, 418-425.

454

(2) Balentine, D. A.; Dwyer, J. T.; Erdman, J. W., Jr.; Ferruzzi, M. G.; Gaine, P. C.; Harnly, J.

455

M.; Kwik-Uribe, C. L. Am. J. Clin. Nutr. 2015, 101, 1113-1125.

456

(3) Satterfield, M.; Brodbelt, J. S. Anal. Chem. 2000, 72, 5898-5906.

457

(4) Johnson, A. R.; Carlson, E. E. Anal. Chem. 2015, 87, 10668-10678.

458

(5) Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779-

459

787.

460

(6) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. BMC Bioinformatics 2010, 11, 395.

461

(7) Katajamaa, M.; Miettinen, J.; Oresic, M. Bioinformatics 2006, 22, 634-636.

462

(8) Wei, X.; Sun, W.; Shi, X.; Koo, I.; Wang, B.; Zhang, J.; Yin, X.; Tang, Y.; Bogdanov, B.;

463

Kim, S.; Zhou, Z.; McClain, C.; Zhang, X. Anal. Chem. 2011, 83, 7668-7675.

464

(9) Zhang, W. C.; Chang, J.; Lei, Z. T.; Huhman, D.; Sumner, L. W.; Zhao, P. X. Anal. Chem.

465

2014, 86, 6245-6253.

466

(10) Zhang, M.; Sun, J.; Chen, P. Anal. Chem. 2015, 87, 9974-9981.

467

(11) Kazusa DNA Research Insititute, C., Japan. Komics Wiki, 2008.

21 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

468

(12) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. Bioinformatics 2008, 24,

469

2534-2536.

470

(13) Zhang, M.; Harrington, P. d. B. Talanta 2013, 117, 483-491.

471

(14) Lin, L. Z.; Harnly, J.; Zhang, R. W.; Fan, X. E.; Chen, H. J. J. Agric. Food. Chem. 2012, 60,

472

544-553.

473

(15) Sun, J. H.; Lin, L. Z.; Chen, P. Curr. Anal. Chem. 2013, 9, 397-416.

474

(16) Chen, J. H.; Ho, C. T. J. Agric. Food. Chem. 1997, 45, 2374-2378.

475

(17) Frank, I. E.; Lanteri, S. Chemometr. Intell. Lab. 1989, 5, 247-256.

476

(18) Wabuyele, B. W.; Harrington, P. D. Appl. Spectrosc. 1996, 50, 35-42.

477

(19) Bylesjo, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J. K.; Holmes, E.; Trygg, J. J.

478

Chemom. 2006, 20, 341-351.

479

(20) Harrington, P. B. J. Chemom. 1991, 5, 467-486.

480

(21) Wang, Z.; Zhang, M.; Harrington Pde, B. Anal. Chem. 2014, 86, 9050-9057.

481

(22) Harnly, J. M.; Doherty, R. F.; Beecher, G. R.; Holden, J. M.; Haytowitz, D. B.; Bhagwat, S.;

482

Gebhardt, S. J. Agric. Food. Chem. 2006, 54, 9966-9977.

483

(23) Anderssen, E.; Dyrstad, K.; Westad, F.; Martens, H. Chemometr. Intell. Lab. 2006, 84, 69-

484

74.

485

(24) Bohm, B. A. In Introduction to Flavonoids; Harwood academic publishers: Amsterdam, The

486

Netherlands, 1999, p 200.

487

(25) Zhang, M.; de B. Harrington, P.; Chen, P. Curr. Chromatogr. 2015, 2, 145-151.

488

(26) Zhang, M.; Zhao, Y.; Harrington, P. d. B.; Chen, P. Anal. Lett. 2016, 49, 711-722.

489

(27) Xu, Z. F.; Sun, X. B.; Harrington, P. D. Anal. Chem. 2011, 83, 7464-7471.

490

(28) Chen, P.; Lu, Y.; Harrington, P. B. Anal. Chem. 2008, 80, 7218-7225.

22 ACS Paragon Plus Environment

Page 23 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

491

(29) Sun, J.; Lin, L. Z.; Chen, P. Rapid Commun. Mass Spectrom. 2012, 26, 1123-1133.

492

(30) Smith, C. A.; O'Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D.

493

E.; Abagyan, R.; Siuzdak, G. Ther. Drug Monit. 2005, 27, 747-751.

494

(31) Lei, Z. T.; Jing, L.; Qiu, F.; Zhang, H.; Huhman, D.; Zhou, Z. Q.; Sumner, L. W. Anal.

495

Chem. 2015, 87, 7373-7381.

496

(32) Nishioka, T.; Kasama, T.; Kinumi, T.; Makabe, H.; Matsuda, F.; Miura, D.; Miyashita, M.;

497

Nakamura, T.; Tanaka, K.; Yamamoto, A. Mass Spectrometry 2014, 3, S0039-S0039.

498

(33) Stein, S. Anal. Chem. 2012, 84, 7274-7282.

499

(34) Qiu, F.; Fine, D. D.; Wherritt, D. J.; Lei, Z.; Sumner, L. W. Anal. Chem. 2016, 88, 11373-

500

11383.

501

(35) Lin, L. Z.; Harnly, J. M. J. Agric. Food. Chem. 2012, 60, 5832-5840.

502

23 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 32

503

Table 1. Comparison of search results for 25 flavonoid and hydroxycinnamic acid derivatives UHPLCDAD-MS data MetFrag (KEGG)2 FlavonQ-2.0v3 METLIN1 Simple Fragment Parent Ion Fragment MS UV-Vis & Search Search Search Search Search MS Match 4 Top 1 ranks 2 3 5 6 8 18 Top 5 ranks 10 12 15 18 24 25 # of NA5 0 13 2 2 0 0 6 # of candidate compounds 663 54 221 213 162 113 Average # of candidate compounds 26.5 2.2 8.8 8.5 6.5 4.5 1. ‘Simple Search’ and ‘Fragment Search’ are two functions for METLIN database: ‘Simple Search’ matches up precursor ions (20 ppm tolerance); ‘Fragment Search’ matches up both precursor ions and selected fragment ions (up to 5 fragment ions) (http://metlin.scripps.edu). 2. KEGG database was selected for MetFrag search. ‘Parent Ion Search’ and ‘Fragment Search’ are functions for MetFrag Web tool: ‘Parent Ion Search’ matches up precursor ions (20 ppm tolerance); ‘Fragment Search’ matches up both precursor ions and MS/MS spectra (https://msbi.ipbhalle.de/MetFragBeta/). 3. ‘MS Search’ matches up precursor ions and characteristic product ions (up to 1 for each precursor ion) in the MS spectra with in-house database; ‘UV-Vis & MS Match’ uses chemometric methods to determine the type of flavonoids before ‘MS Search’. 4. Number of correct first ranked candidates. 5. Number of ‘none of correct candidates available’. 6. Number of total candidate compound for all queries.

504 505

24 ACS Paragon Plus Environment

Page 25 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 2. Flavonoid identification accuracy in different plants by FlavonQ-2.0v # of flavonoids # of # of uncertain Accuracy a b Plant name identified misidentification peaksc (%) Blueberry 39 2 2 89.7 Mizuna 47 1 0 97.9 Purple mustard 45 1 4 88.9 Red cabbage 44 0 0 100.0 Red mustard green 88 1 6 92.0 a Flavonoid peaks were identified by FlavonQ with s/n setting at 10. bNonflavonoid peaks were identified as flavonoids. cIdentity of peaks cannot be verified based on the data given. 506 507 508 509 510 511 512 513 514 515 516

25 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 32

A

B

C

D

Group A: Flavone, flavonol, and hydroxycinnamic acid derivatives; Group B: Flavan, flavanol, flavanone, and flavanonol; Group C: Anthocyanidin; Group D: Isoflavone.

Figure 1. Core structures of the main flavonoid classes and hydroxycinnamic acid derivatives. 517 518 519 520 521 522 523

26 ACS Paragon Plus Environment

Page 27 of 32

A

0.35 Flavone/HAD Flavan/flavanone

Normalized Response

0.3

Anthocyanidin Isoflavone

0.25

0.2

0.15

0.1

0.05

0 200

250

300

350 400 450 Wavenumber (nm)

500

550

600

B 0.8 0.6

PC #2 (18%, 0.0927)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A: Flavone/HAD B: Flavan/flavanone C: Anthocyanidin D: Isoflavone C

C

0.4

A A A A AA AAAA A AA A A A A A A A A A AA A A A A A A A AA A A A A AA A AAAA A AA AA A A A AD A AAAAA A CC AA A A A A A DD C B AA D D B A D A D C A A A C C AA B B A A AA B BB A AA B B B BBB B BBB A A B B B B B B B A B BBBB B

0.2 0 -0.2 -0.4 -0.6 -0.8

C C C C C C CC C CC CC C C C C

-0.6

-0.4

-0.2 0 0.2 PC #1 (40%, 0.2)

0.4

0.6

0.8

Figure 2. One hundred and forty six UV-Vis spectra of flavonoids and HAD (A) and principal component analysis score plot for UV-Vis spectra data of four classes (B).

524 525

27 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 32

Figure 3. Flowchart for step-wise classification strategy and conventional classification strategy. 526 527

28 ACS Paragon Plus Environment

Page 29 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

528 529

Figure 4. UV-vis spectra after wavelength range optimization (left) and dendrogramatic

530

representations (right) of differentiation for four classes of flavonoids by conventional

531

classification strategy (A) and step-wise classification strategy (B).

532 533

29 ACS Paragon Plus Environment

Analytical Chemistry

0.2 0 -0.2 -0.6 -1

C

C C CC C CC CC

Peak #12 in blueberry sample is anthocyanidin peak, but misclassified as A (flavone/HAD) -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 PC #1 (51%, 0.324) Average UV-vis spectrum of flavone/HAD Average UV-vis spectrum of anthocyanidin UV-vis spectrum of Peak #12

0.12 0.08 0.04 0 280 300

B 0.8 A: Other flavonoids and HAD C: Anthocyanidins A A AAA AA 0.4 A AA A A A A A A A AA A A A A A C A A 0 A C A C C C A CC C A CCC A A A A A AA X A A A A -0.4 A A AA A A AA A A Peak #12 in blueberry sample A -0.8 is anthocyanidin, and classified correctly -1.5 -1 -0.5 0 0.5 PC #1 (62%, 0.397)

PC #2 (22%, 0.138)

B B B B B BBB BB B BB C CC C C C C

A: Flavone/HAD B: Flavan/flavanone C: Anthocyanidins AAAA AAAA A A AA X A A A A A A A A A A A A AA A A A A A A A A A AA

350

400 450 500 550 Wavelength number (nm)

Average UV-vis spectrum of flavone/HAD Average UV-vis spectrum of anthocyanidin UV-vis spectrum of Peak #12

D

Normalized intensity

PC #2 (30%, 0.191)

A 0.6

Normalized intensity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 32

600

0.2

0.1

0 350

400 450 500 550 Wavelength number (nm)

600

Figure 5. Principal component analysis score plot for UV-Vis spectra data of three classes by conventional classification strategy (A) and by step-wise classification strategy-step 1 (B). Average UV-Vis spectra of flavone/HAD and anthocyanidin and UV-Vis spectrum of Peak #12 in blueberry sample after starting WL optimization in conventional classification strategy (C) and in step-wise classification strategy-step 1 (D). 534 535

30 ACS Paragon Plus Environment

Page 31 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

536 537 538

Analytical Chemistry

A

B

C

D

n1+n2+n3 = 5 and 0 ≤ n1, n2, n3 ≤ 5

Figure 6. Chemical structures for quercetin (A), morin (B), hieracin (C), and pentahydroxyflavone (D).

539 540 541

31 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

542

Page 32 of 32

For TOC only

543

544 545 546

32 ACS Paragon Plus Environment