Spectral Mining for Discriminating Blood Origins in ... - ACS Publications


Spectral Mining for Discriminating Blood Origins in...

1 downloads 84 Views 2MB Size

Article pubs.acs.org/ac

Spectral Mining for Discriminating Blood Origins in the Presence of Substrate Interference via Attenuated Total Reflection Fourier Transform Infrared Spectroscopy: Postmortem or Antemortem Blood? Ayari Takamura,*,†,‡ Ken Watanabe,† Tomoko Akutsu,† Hiroshi Ikegaya,§ and Takeaki Ozawa*,‡ †

First Department of Forensic Science, National Research Institute of Police Science, 6-3-1, Kashiwanoha, Kashiwa, Chiba 277-0882, Japan ‡ Department of Chemistry, Graduate School of Science, The University of Tokyo, 7-3-1, Hongo, Bunkyo, Tokyo 113-0033, Japan § Department of Forensic Medicine, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, 465 Kajii-cho, Hirokoji Agaru, Kawaramachi-dori, Kamigyo, Kyoto 602-8566, Japan S Supporting Information *

ABSTRACT: Often in criminal investigations, discrimination of types of body fluid evidence is crucially important to ascertain how a crime was committed. Compared to current methods using biochemical techniques, vibrational spectroscopic approaches can provide versatile applicability to identify various body fluid types without sample invasion. However, their applicability is limited to pure body fluid samples because important signals from body fluids incorporated in a substrate are affected strongly by interference from substrate signals. Herein, we describe a novel approach to recover body fluid signals that are embedded in strong substrate interferences using attenuated total reflection Fourier transform infrared (ATR FTIR) spectroscopy and an innovative multivariate spectral processing. This technique supported detection of covert features of body fluid signals, and then identified origins of body fluid stains on substrates. We discriminated between ATR FT-IR spectra of postmortem blood (PB) and those of antemortem blood (AB) by creating a multivariate statistics model. From ATR FT-IR spectra of PB and AB stains on interfering substrates (polyester, cotton, and denim), blood-originated signals were extracted by a weighted linear regression approach we developed originally using principal components of both blood and substrate spectra. The blood-originated signals were finally classified by the discriminant model, demonstrating high discriminant accuracy. The present method can identify body fluid evidence independently of the substrate type, which is expected to promote the application of vibrational spectroscopic techniques in forensic body fluid analysis. ody fluid evidence such as that of blood, urine, saliva, semen, and vaginal secretion is frequently collected during forensic investigation of violent crimes such as sexual assault and murder. Discrimination and identification of the origin or type of the body fluid is quite effective to elucidate how an alleged crime was committed. Furthermore, identification of body fluid types can increase the value of subsequent individual identification analysis of DNA.1 To date, various presumptive and confirmatory methods have been conducted mainly using enzymatic or immunological techniques specific for each body fluid type.2 However, most methods involve some practical issues: they can be destructive, time-consuming, expensive, and highly dependent on expertise. According to increases in the need for sample saving for subsequent DNA analysis and social interest in objectivity in analytical results, strong demand persists to establish an alternative analytical method for processing body fluid samples nondestructively, with greater versatility and objectivity.

B

© 2017 American Chemical Society

Recently, forensic application of vibrational spectroscopic techniques to body fluid samples has garnered wide attention.3,4 Vibrational spectroscopy, such as infrared (IR) absorption and Raman scattering, have desirable properties for use with forensic analysis; they can analyze a sample without invasion and can identify it selectively based on its spectral pattern. Additionally, they are highly versatile, as demonstrated in various forensic samples and interests.3 Typical patterns of IR and Raman spectra of various body fluids have been reported.5−7 Moreover, their characteristic peaks have been assigned carefully. Such previous reports suggest that particular body fluids exhibit distinctive spectral patterns that are visually distinguishable. Received: May 10, 2017 Accepted: August 15, 2017 Published: August 15, 2017 9797

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry Although respective body fluids have their own characteristic spectral patterns, they also exhibit subtle changes or variances in spectra depending on the spatial heterogeneity on body fluid traces and constituent differences among donors.8 Moreover, when aiming to discriminate among quite similar body fluid spectra, such as human versus animal discrimination and for donor age prediction, visual judgment is difficult and not sufficiently reliable.4 Chemometric techniques are useful in such cases: Using alternate least-squares (ALS) technique, a spectrum decomposition technique, body fluid spectra are reconstituted as a linear combination of some spectroscopic signatures. Then, the body fluid spectra with their variances are differentiated from other body fluids based on the regression quality.8−11 Furthermore, some multivariate discriminant analysis methods such as a soft independent model of class analogy, partial least-squares discriminant analysis (PLSDA), and support vector machine discriminant analysis, have been demonstrated to recognize small spectral differences that achieve accurate discrimination among similar spectra.8,12−14 Earlier reports described above demonstrate the great potential of a combination of vibrational spectroscopy and chemometrics for discriminating body fluid samples.8−14 However, most previous studies have shown targeted discrimination of pure body fluids dried on noninterfering substrates, such as glass or aluminum foil. In such cases, the approach is applicable only when pure body fluids are obtained in a liquid form or deposited on noninterfering substrates; this is quite a rare case in criminal investigations. Considering broader application to criminal evidence, more advanced and versatile methods are necessary for handling spectra from body fluid stains incorporated in various substrates such as clothes, carpeting, and wallpaper. Discrimination of body fluid stains on interfering substrates was reported previously. However, earlier approaches required visual selection of spectra for analysis and manual subtraction of substrate signals.15,16 Otherwise, the approaches were limited to one specific substrate type or required visual comparisons of specific peaks.17−19 Consequently, no automatic and versatile analytical method for discriminating body fluid stains has been established yet. Discrimination of body fluid stain spectra involves some difficult issues. Compositions of both body fluids and substrates are distributed heterogeneously on body fluid stains. This situation induces complicated fluctuations in the spectral patterns of body fluid stains, depending on the measuring points. In addition, body fluid compositions are diffused extensively into substrates. The diffusion makes body fluid signals weaker against strong substrate interference and more vulnerable to instrumental noise. Furthermore, the discrimination method requires applicability to unknown substrate types because various forensic evidences cannot be anticipated. To overcome these issues, a novel method considering experimental and data-treatment approaches is needed. This study examines a method to identify body fluid signals precisely in the presence of strong substrate interference using attenuated total reflection Fourier transform infrared spectroscopy (ATR FT-IR) and the multivariate spectral processing we have developed. This approach enables determination of the origins of body fluid stains on interfering substrates using a statistical model constructed with spectral data of pure body fluids. To demonstrate the efficacy of our approach, postmortem blood (PB) and antemortem blood (AB) stains were set as targets for their discrimination. The discrimination is useful in criminal investigation to assess whether the incident

has criminality, or not, and to suggest a sequence of crimerelated events. A PLSDA model constructed for ATR FT-IR spectra of pure PB and AB differentiated the two pure blood groups correctly. For the spectra of PB and AB stains prepared on three interfering substrates of polyester, cotton, and denim,5,15,16 spectral signals originated from blood were extracted automatically using novel spectral processing, weighted linear regression with principal components of both blood and substrate spectra. The extracted blood signals were classified using the PLSDA model, achieving high discrimination accuracy for all substrate types. The applicability of the method to discrimination of body fluid types was demonstrated in the presence of various substrate-interfering factors.



MATERIALS AND METHODS Collection and Preparation of Blood Samples. AB samples were collected from 10 healthy Japanese volunteers aged from 20s to 60s, without addition of anticoagulant. PB samples were collected from the hearts of 12 deceased persons with the ages from 20s to 90s during postmortem examination. Times after death of the hearts were half a day to 16 days. The collected PB samples were stored at −80 °C until use. For preparation of the pure blood samples, an aliquot of 20 μL of each PB and AB sample was deposited on a glass slide and dried overnight at room temperature. To prepare blood stain samples, we used substrates of three types: black-dyed 100% polyester cloth, undyed cotton cloth, and denim from blue jeans. An aliquot of 5−10 μL of each blood sample was deposited on the substrate for one spot. Ten spots in all were made for each blood sample on each substrate type. The blood stains were dried overnight at room temperature. All procedures involving human participants were approved by the Institutional Ethics Committee of the National Research Institute of Police Science (Kashiwa, Japan). Instrumentation and Spectra Collection. ATR FT-IR spectra of blood samples were recorded using an FT-IR spectrometer (Spectrum One; PerkinElmer Inc., Massachusetts) using the ATR accessory with a ZnSe crystal. The range of wave numbers was 600−3700 cm−1; the spectral resolution was 4 cm−1. Spectrum scanning was performed four times per measurement. A recorded spectrum was the average of the four scans. Each time before placing a blood sample, the ATR crystal was cleaned with 70% ethanol and was dried completely; then the background was acquired. Ten spectra were collected for each pure blood sample. One spectrum was recorded from one blood stain spot; then 120 spectra from 12 PB stain samples and 100 spectra from 10 AB stain samples were collected in total for each substrate type. In addition, 100 spectra were collected from different points of untreated regions on each substrate. The ATR FT-IR spectra of L-lactic acid (Wako Pure Chemical Industries Ltd., Japan), D-glucose (Nacalai Tesque Inc., Japan), and indigo (Wako Pure Chemical Industries Ltd.) were measured in the same way as the blood samples. Instruments were operated using software (Spectrum ver. 5.0.1; PerkinElmer Inc.). Data Treatment and PLSDA Model Construction. Preprocessing of spectral data was executed using IGOR pro software (WaveMetrics Inc., Oregon). All spectra were transformed into absorption spectra using log(1/T) function. Then, the region of 1711−2669 cm−1 was deleted (resulting in 2142 points per spectrum) to avoid ATR crystal interference, as described in an earlier report (Figure S-1).14 Subsequently, the spectra were normalized by the total area. Multivariate analyses 9798

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry

Figure 1. Scheme for development of a discriminant weight factor. Weight vectors (w) derived in the PLSDA model were multiplied by corresponding discriminant powers (J), which indicated relative contributions for the discrimination process. Multiplied weight vectors were converted to their absolute values and summed. Finally, the square roots of each value were calculated, resulting in a discriminant weight factor (z).

In that equation, i and j indicate ordinal number of PCs of pure blood spectra and the substrate spectra, respectively; c denotes a coefficient of a PC; r is a residual vector. To calculate the coefficients for each PC in blood stain spectra, linear leastsquares regression analysis was applied as shown below.

of spectral data were executed using R software with R packages containing specific functions. To construct a discriminant model between pure PB and AB, the PLSDA algorithm was implemented using the plsda function in R package “caret”. To assess the quality of regression and to identify the best number of latent variables (LVs) adopted in the PLSDA model, the predicted residual error sum of squares (PRESS) was calculated using 10-fold cross validation (CV) with the following formula:

A = CK AKT = CKKT

n

PRESS =

2

∑ (yi − yi ̂ )

AKT(KKT)−1 = C

i=1

Response parameter y represents a numerical label of each spectrum to its group membership. yi is predicted; ŷi is the ideal value for spectrum i. In addition, the discrimination accuracy was evaluated using Kappa coefficients (Κ) defined as shown below. Κ=

In those equations, A is an N × 2142 matrix, where N represents the number of spectra and the column corresponds to each spectrum (2142 points), C is an N × (imax + jmax) matrix, which represents coefficients for the PCs in each blood stain spectrum, and K is an (imax + jmax) × 2142 matrix, in which the PCs are arranged. To ascertain the best number of the PCs adopted, we tried combinations of imax, jmax = 1−5. The bloodoriginated signal was obtained by subtracting substrate signals, max cj,substrate × kj,substrate, from a blood stain spectrum a. Σjj∈substrate After normalization by the total area, the extracted bloodoriginated signals were classified using the PLSDA model. Discriminant Weight Factor and Weighted LeastSquares Regression for Blood Stain Reconstitution. To improve the discrimination accuracy, we conducted weighted least-squares regression of blood stain spectra by incorporating “a discriminant weight factor (z)” built with the LVs and corresponding scores for pure PB and AB spectra in the PLSDA model (Figure 1). In normal linear least squares regression, every point in a spectrum is fitted with the uniform error, based on the principle of least squares. However, no point has any uniform contribution to the discrimination by the PLSDA model. A discriminant weight factor (z) was introduced to weight the regression for blood stain spectra reconstitution positively to be more efficient in a subsequent PLSDA prediction. A prediction process by a PLSDA model comprises repetitive sequences of (1) projection of the spectra on a weight vector (wl), (2) calculation of the scores (tl), and (3) subtraction of the LV’s contribution (tl × pl ; pl is a loading vector) from the spectra.21,22 The number of adopted LVs corresponds to the number of repetitions of the sequence. The t scores are also linearly related to response parameter y, a final answer of the discrimination. To evaluate the contribution of each LV in the PLSDA model, we defined “a discriminant power of an LV (Jl)” as shown below.

Po − Pe 1 − Pe

Therein, Po denotes the probability of observed agreement, which is identical to accuracy rate, and Pe is the probability of expected agreement by chance.20 Κ coefficients are useful to compare the discrimination accuracy between groups of unequal size. To reveal the informative spectral regions for discriminating PB and AB, genetic algorithm (GA) analysis was conducted using the rbga.bin function in R package “genalg”. The population size was set to 50. The number of generations was to 100. The mutation probability was 0.05. In all, 10 runs were conducted for each GA analysis. Reconstitution of Blood Stain Spectra. The preprocessed spectral data sets of each of pure blood samples and substrates were decomposed into their spectral constituents using principal component analysis (PCA) or multivariate curve resolution alternate least-squares (MCR-ALS) algorithm, without mean-centering. In MCR-ALS, the als function in R package “ALS” was used, setting dominant regions of positive or negative in corresponding principal components (PCs) as initial spectra. A spectrum of the blood stain (a) was reconstituted as a linear combination of PCs (k) of pure blood samples and the substrate, as expressed in the following formula: jmax

i max

a=

∑ i ∈ blood

ci ,blood × k i ,blood +



cj ,substrate × k j ,substrate + r

j ∈ substrate

9799

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry

Figure 2. Discrimination of ATR FT-IR of pure postmortem and antemortem. Average ATR FT-IR spectra of pure PB (camel) and AB (red) (left), and a region representing a distinctive difference between them (right). The 1711−2669 cm−1 region was excluded to avoid interference from the ATR crystal.

discriminant power of l th LV(Jl ) =

donor’s ages,27 PB and AB samples were collected from 12 and 10 donors with wide ranges of ages, 20s to 90s and 20s to 60s, respectively. For applicability of the method to various unknown substrates, it is necessary to build a discrimination model based on spectral data of the pure blood samples without interference of the substrate. Blood samples in forensic investigation are collected and analyzed typically in a dried state. Then, pure PB and AB samples were prepared as dried samples on glass slides before analysis. As the training data set, 10 ATR FT-IR spectra for each blood sample were recorded (Figure S-2, parts A and B). Human blood cells, such as red blood cells, white blood cells, and platelets, have the diameter of several micrometers in fresh liquid state. The blood cells burst and break into fragments during the drying process.28 Then, the molecular components of blood cells are randomly distributed in the dried blood samples. Additionally, the ATR crystal of ca. 1.5 mm diameter has an adequate region of interest to average molecular distribution. Therefore, ATR FTIR sufficiently offers spectral information from the whole blood samples. Moreover, the spectral variances depending on measuring points were reduced significantly in the present method, compared to other vibrational spectroscopic technique with higher spatial resolutions.11,13 The mean ATR FT-IR spectra of 12 pure PB samples or 10 pure AB samples are shown in Figure 2. Distinctive peaks in the blood spectra were found, which have been assigned to proteins [1241 cm−1 (amide III), 1532 cm−1 (amide II), 1643 cm−1 (amide I, CO stretching), and 3285 cm−1 (amide A)], glucose [1082 cm−1 (symmetric C−O stretching)], and lipid [1165 cm−1 (C−O stretching of lipid ester), 2873 cm−1 (symmetric methylene stretching), and 2959 cm−1 (asymmetric methyl stretching)].5,6,14,29 Peaks located at 1392 and 1454 cm−1 were derived from symmetric CH3 and asymmetric CH3 bending, respectively, of amino acid side chains, lipids, and proteins. The two mean-spectral patterns of PB and AB were so similar that it was almost impossible to distinguish the spectra by visual comparison. However, a slight difference was found in relative absorption around 1127 cm−1 (Figure 2, right). Discrimination of Postmortem and Antemortem Blood. To differentiate ATR FT-IR spectra between PB and AB samples, we used a multivariate analysis, PLSDA, by which the spectral data were replotted in alternative axes (LVs) so that the two blood groups were separated efficiently. A training data set of 120 pure PB spectra and 100 pure AB spectra was used to

( tl̅ ,purePB − tl̅ ,pureAB)2 Vl ,purePB + Vl ,pureAB

Therein, tl̅ and Vl, respectively, represent the average and variance of t scores for the lth LV in spectra of pure PB or AB. This formula shows that the greater the difference of t scores between groups or the less the variance of the t scores within groups, the more the LV is contributive to the discrimination. The inside of the square root in the formula is coincident with the ratio of difference between classes to variance within classes, as described in the method of Fisher’s discriminant analysis.23 A discriminant weight factor (z) was established using weight vectors (wl) and the discriminant power Jl of LVs as the following formula. zi =

∑ Jl × abs(wl ,i)

(z = z1, z 2, ..., z 2142)

l

Considering the linearity of LVs’ contribution to PLSDA prediction, the discriminant weight factor z was designed as a linear combination of weight vectors with the coefficients of Jl. Weighted least-squares regression was performed toward blood stain spectrum using Z matrix in which the elements of z are arranged diagonally. The blood stain spectra A and the PCs matrix K were transformed, respectively, into the weighted forms Az and Kz by Z. Then, the equation was solved in the same way with normal linear least-squares regression.

AZ = A z

KZ = K z A z = CK z A zK z T(K zK z T)−1 = C

Using the calculated coefficient c, substrate signals were subtracted; then blood-originated signals were obtained.



RESULTS AND DISCUSSION ATR FT-IR Spectra of Postmortem and Antemortem Blood. This study was conducted to establish a method for discriminating PB and AB stains incorporated in substrates. Because the FT-IR spectrum of blood is possibly influenced by physical conditions of blood donors,24−26 AB samples were collected only from healthy Japanese volunteers. In addition, considering a possibility of the spectral variances depending on 9800

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry

Figure 3. Differentiation of ATR FT-IR spectra of pure postmortem and antemortem blood by multivariate analysis. (A) Three-dimensional dot plot of t scores of PB (camel) and AB (red) for the first three latent variables in the PLSDA model. (B) PRESS evaluated in 10-fold CV of the PLSDA model with 1−10 latent variables. (C) An increase of discrimination efficacy with an increase of the number of latent variables (LVs) used in the PLSDA model. The Κ coefficients were evaluated in 10-fold cross validation (CV). With four LVs, all spectra of pure PB and AB were classified correctly (Κ = 1.0).

Figure 4. Measurements of ATR FT-IR spectra of blood stains on interfering substrates. Average ATR FT-IR spectra of PB (camel) and AB (red) stains on polyester (A), cotton (B), and denim (C). A mean spectrum of the intact substrate (black) is presented at the bottom of each figure.

amide II (1428−1710 cm−1) were avoided explicitly, whereas the other regions were selected randomly or concentratedly. The region of lowest wavenumber (600−635 cm−1) was also not selected. On the basis of the results, we performed the secondary GA analysis of the spectra, excluding the regions of 1428−1710 and 600−635 cm−1 as well, with window width of 10 cm−1. Results of the subsequent 10 runs showed a tendency of variable selection stronger than the results of first runs (Figure S-3B). The regions of 746−975 and 1096−1135 cm−1 were selected concentratedly in almost all runs, assigned, respectively, to symmetric C−O stretching and low-frequency vibrations related to molecular scaffold such as C−C−H or C− O groups. The region of 1096−1135 cm−1 has a peak at 1127 cm−1, as described above (Figure 2, right), which corresponds to the most characteristic band of lactic acid (Figure S-4A).30,31 Results of earlier biochemical analyses suggest that PB has a higher concentration of lactic acid than AB because anaerobic glycolysis, which converts glucose into lactic acid, continues even after death.32−35 Our results demonstrated that the difference of lactic acid concentration between PB and AB was recognized as significant differences in ATR FT-IR spectral patterns. We found that glucose, which has characteristic absorption at 1082 or 1033 cm −1 , is not useful for discriminating between PB and AB in GA analysis.29,36 Additionally, the standard deviation spectra of PB and AB

construct the PLSDA model. The projected t scores were found for the first three LVs, representing that the spectral data of the two blood groups were distributed separately (Figure 3A). When a PLSDA model is constructed, setting of the number of LVs is important to optimize the discrimination efficiency and to avoid over-regression of the model. To ascertain the best number of LVs of the PLSDA model, PRESS values were calculated using 10-fold CV using 1−10 LVs (Figure 3B). After incorporation of the fourth LV, the rate of decrease of PRESS became considerably lower, indicating that over-regression proceeded. Additionally, Κ coefficients for classifying pure PB and AB spectra reached 1.00 (100% accuracy) with four LVs (Figure 3C). Consequently, the PLSDA model with four LVs was used for additional analyses. Interpretation of Differences between Postmortem and Antemortem Blood. To deepen understanding of the origin of differentiation between PB and AB spectra, GA analysis was applied to the PLSDA model. GA analysis allows for the selection of more informative variables, or regions of wavenumber, excluding other regions that are less significant and which have higher variations within the group. Primary GA analysis was conducted toward whole spectra of pure PB and AB (except the region of 1711−2669 cm−1), setting the window width of 18 cm−1. The results of 10 runs are shown in Figure S-3A. The regions of asymmetric CH3, amide I, and 9801

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry

Figure 5. Extraction of blood-originated signals from blood stain spectra. (A) Average blood-originated signals extracted from PB (camel) and AB (red) stains on polyester; the blood stains were reconstituted with two blood principal components (PCs) and two polyester PCs. (B) Average blood-originated signals extracted from PB (camel) and AB (red) stains on cotton; the blood stains were reconstituted with two blood PCs and two cotton PCs. (C) Average blood-originated signals extracted from PB (camel) and AB (red) stains on denim; the blood stains were reconstituted with two blood-PCs and three denim PCs.

Table 1. Κ Coefficients for Discrimination of Blood-Originated Signals Extracted from 120 PB and 100 AB Blood Stains on Polyester, Cotton, and Denim blood stains on polyester

blood stains on cotton

no. of PCs of polyester

blood stains on denim

no. of PCs of cotton

no. of PCs of denim

no. of PCs of blood

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1 2 3 4 5

0.217 0.304 0.337 0.329 0.296

0.679 0.627 0.618 0.610 0.559

0.217 0.169 0.169 0.138 0.177

0.138 0.099 0.099 0.061 0.146

0.084 0.084 0.076 0.053 0.069

0.397 0.388 0.354 0.370 0.351

0.458 0.467 0.436 0.459 0.454

0.243 0.252 0.137 0.231 0.289

0.244 0.258 0.291 0.237 0.260

0.038 0.075 0.084 0.084 0.030

0.203 0.193 0.152 0.193 0.193

0.369 0.487 0.420 0.412 0.431

0.377 0.486 0.429 0.456 0.439

0.254 0.424 0.387 0.445 0.440

0.212 0.519 0.328 0.240 0.240

were large values around 981−1092 cm−1 (Figure S-4, parts B and C). Results suggest that the variance of glucose concentration within the groups was so large that detection of the difference between the groups was insufficient. Consequently, we concluded that the presence of higher concentration of lactic acid was attributed to the discrimination of PB and AB by the PLSDA model. Extraction of Blood-Originated Signals from ATR FTIR Spectra of Blood Stain on Substrates. To examine the differentiation of blood stains on interfering substrates, we prepared substrates of three types: polyester, cotton, and denim. Ten aliquots each of 12 PB and 10 AB samples were deposited and dried on each substrate. Subsequently, an ATR FT-IR spectrum was recorded for each spot. Then, 120 PB stain spectra and 100 AB stain spectra were collected in all for each substrate type. The mean spectra of PB and AB stains on each substrate are depicted in Figure 4A−C. Results showed that substrate-originated signals were dominant in all the spectra. However, blood-originated signals were recognized only at amide I and amide II. Blood-originated signals around 1127 cm−1 were overlapped thoroughly and buried in strong substrate-originated signals because this region is also the most informative for most organic compounds. Consequently, it was impossible to discriminate the spectra between PB and AB stains on substrates by visual comparison or to apply the spectra to the PLSDA model established with no processing. Blood stain spectra are approximated as linear combinations of blood-originated signals and substrate-originated signals. However, subtle fluctuations in each signal are caused by variations of chemical compositions dependent on measuring points and blood donors, or by other experimental factors. Consequently, it is necessary that the blood stain spectra be separated into respective signals including such small variations

to detect small spectral differences between PB and AB. We first performed PCA toward the spectral data sets of the three substrates (100 spectra for each substrate) and pure blood samples (220 spectra of 120 PB and 100 AB spectra), respectively (Figures S-5 and S-6). The first PC was almost identical to the mean spectra. The lower-ordered PCs represented small variations in the data set. In Figure S-6B, the left singular vectors obtained by singular vector decomposition (SVD) of the blood data set are shown, indicating the relative values of each PC contained in each spectrum. For the second PC of blood data set, PB spectra show positive values, whereas AB spectra had negative values. These results suggest that the second PC of the blood data set included important information to discriminate between PB and AB spectra. The blood stain spectra were regressed by linear combination of the PCs of blood and the substrates, varying the number of PCs used from one to five, respectively. Subsequently, the components attributed to the substrate were subtracted. Finally, blood-originated signals were obtained (Figure 5A−C). Discrimination of Blood Stain Spectra. After normalization with the total area, the extracted blood-originated signals were finally classified using the PLSDA model. Table 1 presents results of the discrimination of the extracted blood signals. Results show that the number of substrate PCs was more influential in the discrimination accuracy rather than the number of blood PCs. For blood stains on polyester and cotton, the use of the first two PCs of the substrates showed better results than the use of the other quantities of substrate PCs. For blood stains on denim, the Κ coefficients obtained with the first two or three PCs of the substrate were comparable. To elucidate this tendency, we performed decomposition of the spectral data set of the substrates and 9802

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry

Table 2. Discrimination Results for Individual and Sample-Average Blood-Originated Signals Obtained Using Normal or Weighted Least Squares Regression of Blood Stain Spectra with the Determined Number of PCsa substrate regression method normal weighted a

blood-originated signal individual sample average individual sample average

polyester 0.627 0.56 0.748 0.91

cotton

(80.9%) (77%) (87.3%) (95%)

0.467 0.73 0.486 0.73

denim

(73.6%) (86%) (74.5%) (86%)

0.486 0.82 0.499 0.82

(74.5%) (91%) (75.5%) (91%)

Κ coefficients and discrimination accuracy rates (in parentheses) are represented for blood stains on each substrate type.

The PB and AB stains on polyester were discriminated accurately, probably because polyester had a low absorption property for fluid compared to cotton and denim. This property allowed for retention of the blood near the polyester surface, causing a large ratio of blood signals to the substrate signals. Another factor that can hamper precise extraction of blood-originated signals is the presence of nonlinear relations attributable to chemical interactions between the blood and substrates. The nonlinear effects can disturb the original spectral profiles or generate a new spectral contribution from interacting substances. The degree of such nonlinear effects is also related to how much blood was physically incorporated into the substrate. In the cases of cotton and denim, a slight disturbance of the extracted blood-originated signals was observed particularly in regions where strong substrate signals existed (Figure 4, parts B and C). This result implies the presence of nonlinear interaction or a lack of spectral components. Despite the difficulties of such issues, the present approach recovered blood signals in the presence of substrate inferences, from which we were able to differentiate the origins of the blood stains. Furthermore, the weighted least-squares method was effective to improve the discrimination accuracy by increasing the regression efficiency for blood stain spectra reconstitution.

blood using the MCR-ALS algorithm, setting the number of the components as two or three (Figure S-7A−D). Although the first components were similar to the mean spectra of each substrate, the second components exhibited a characteristic profile: the signals peculiarly appeared in the regions of low frequency. We presumed that the second components indicate a factor adjusting the variation of sample attachment to the ATR crystal in each measurement. Because of a property of the evanescent field, the sample attachment is more influential in the low-frequency region than in the high-frequency region. Consequently, incorporation of the second PC of the substrates was necessary to express the contribution of the substrate signals precisely. However, MCR-ALS components of denim spectra also represented distinctive signals assigned to indigo molecules, which were unmixed as the third component (Figure S-7, parts D and E). Actually, denim cloth has subtle color inhomogeneity or spatial variation in the concentration of indigo. Therefore, the best number of components adopted for denim was determined as three, whereas that for polyester and cotton was two. The best number of blood PCs was identified as two because of the higher Κ coefficients (Table 1) and the result of SVD for the blood data set (Figure S-6B). Ten spectra were recorded for each PB or AB stain sample. Therefore, sample-average blood-originated signals were obtained and classified using the PLSDA model as well. When reconstituted with the determined numbers of PCs of blood and the substrates, the blood stains were discriminated with accuracy of more than 73.6% (Κ 0.467) for individual blood-originated signals. By averaging 10 spectra from a sample, the discrimination accuracy increased to 86% (Κ 0.73) and 91% (Κ 0.82), respectively, for blood stains on cotton and denim (Table 2). This result indicates that random noise covered on individual blood-originated signals was canceled by averaging. Advanced Discrimination of Blood Stain Spectra Using Weighted Least-Squares Regression. To regress blood stain spectra more efficiently for the PLSDA prediction, a discriminant weight factor, which reflects relative contributions of each LV in the PLSDA model (Figure 1), was incorporated into the calculations. The blood stain spectra were reconstituted using the weighted least-squares regression. Then, the extracted blood-originated signals were classified using the PLSDA model (Table 2). For blood stains on polyester, remarkable increases in Κ coefficients were obtained in both individual and sample-average blood-originated signals when the weighted least-squares regression was used. In this case, the discrimination accuracy of individual and sampleaverage blood-originated signal, respectively, reached 87.3% (Κ 0.748) and 95% (Κ 0.91). For blood stains on cotton and denim, slight increases in Κ coefficients were obtained in discrimination of individual blood-originated signals, although Κ coefficients in the sample-average signals did not change.



CONCLUSION We demonstrated that the developed method via ATR FT-IR and a novel multivariate spectral processing was applicable to discriminate origins of blood stains with high accuracy by identifying spectral signals of blood precisely in the presence of substrate interferences. Salient advantages of this method are the availability without manual treatment of spectral data and independence of substrate types. The method might be useful to analyze various body fluid stains using spectral components that are specific for the body fluids. The applicability of the present method is expected to be extended to broader subjects with spectral contamination and unmixing of interference, such as analyses of biomolecules in the intercellular matrix,37,38 characterization of soil parameters (e.g., organic matter, moisture, and heavy metal contents),39−41 and quality evaluation of foods.42,43 Consequently, the developed method is expected to contribute widely to studies of complex mixtures using multivariate spectroscopic analysis.



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.7b01756. Wavenumbers vs alternative variables (x points), spatial variations in ATR FT-IR spectra of pure PM and AM 9803

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804

Article

Analytical Chemistry



(26) Shaw, R. A.; Mantsch, H. H. In Encyclopedia of Analytical Chemistry; Meyers, R. A., Ed.; John Wiley & Sons, Ltd.: Hoboken, NJ, 2006. (27) Makhnii, T.; Ilchenko, O.; Reynt, A.; Pilgun, Y.; Kutsyk, A.; Krasnenkov, D.; Ivasyuk, M.; Kukharskyy, V. Ukr. J. Phys. 2016, 61, 853−862. (28) Skelton, R. A Survey of Forensic Science; Lulu.com: Raleigh NC, 2011; p 217. (29) Kanagathara, N.; Thirunavukkarasu, M.; Jeyanthi, C.; Shenbagarajan, P. Int. J. Pharma Bio Sci. 2011, 1, 74−81. (30) Petibois, C.; Melin, A.-M.; Perromat, A.; Cazorla, G.; Déléris, G. J. Lab. Clin. Med. 2000, 135, 210−215. (31) Petibois, C.; Gionnet, K.; Goncalves, M.; Perromat, A.; Moenner, M.; Deleris, G. Analyst 2006, 131, 640−647. (32) Donaldson, A. E.; Lamont, I. L. PLoS One 2013, 8, e82011. (33) Keltanen, T.; Nenonen, T.; Ketola, R. A.; Ojanperä, I.; Sajantila, A.; Lindroos, K. Int. J. Legal Med. 2015, 129, 1225−1231. (34) Zilg, B.; Alkass, K.; Berg, S.; Druid, H. Forensic Sci. Int. 2009, 185, 89−95. (35) Belsey, S. L.; Flanagan, R. J. J. Forensic Leg. Med. 2016, 41, 49− 57. (36) Petibois, C.; Rigalleau, V.; Melin, A. M.; Perromat, A.; Cazorla, G.; Gin, H.; Deleris, G. Clin. Chem. 1999, 45, 1530−1535. (37) Chen, P. H.; Shimada, R.; Yabumoto, S.; Okajima, H.; Ando, M.; Chang, C. T.; Lee, L. T.; Wong, Y. K.; Chiou, A.; Hamaguchi, H. O. Sci. Rep. 2016, 6, 20097. (38) Chiu, L. D.; Ichimura, T.; Sekiya, T.; Machiyama, H.; Watanabe, T.; Fujita, H.; Ozawa, T.; Fujita, K. Sci. Rep. 2017, 7, 43569. (39) Soriano-Disla, J. M.; Janik, L. J.; Viscarra Rossel, R. A.; Macdonald, L. M.; McLaughlin, M. J. Appl. Spectrosc. Rev. 2014, 49, 139−186. (40) Gredilla, A.; Fdez-Ortiz de Vallejuelo, S.; Elejoste, N.; de Diego, A.; Madariaga, J. M. TrAC, Trends Anal. Chem. 2016, 76, 30−39. (41) Garrigues, S.; de la Guardia, M. TrAC, Trends Anal. Chem. 2013, 43, 161−173. (42) Efenberger-Szmechtyk, M.; Nowak, A.; Kregiel, D. Crit. Rev. Food Sci. Nutr. [Online early access]. DOI: 10.1080/ 10408398.2016.1276883. Published Online: Jan 27, 2017. http:// www.tandfonline.com/doi/abs/10.1080/10408398.2016. 1276883?journalCode=bfsn20. (43) Fernández Pierna, J. A.; Vermeulen, P.; Amand, O.; Tossens, A.; Dardenne, P.; Baeten, V. Chemom. Intell. Lab. Syst. 2012, 117, 233− 239.

blood, understanding spectral differences between PB and AB via the genetic algorithm, reference spectra for interpreting origins of spectral differences between PB and AB, principle component analysis for ATR FT-IR spectra of substrates and PB and AB, and decomposition of blood stain spectra using the MCR-ALS algorithm (PDF)

AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Phone: +81-4-7135-8001. Fax: +81-4-7133-9159. *E-mail: [email protected]. Phone: +81-3-58414351. Fax: +81-3-5802-2989. ORCID

Takeaki Ozawa: 0000-0002-3198-4853 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the Japan Society for the Promotion of Science [Grants-in-Aid for Young Scientist (B) 17K18380 to A.T. and Grants-in-Aid for Scientific Research (S) 26220805 to T.O.].



REFERENCES

(1) Virkler, K.; Lednev, I. K. Forensic Sci. Int. 2009, 188, 1−17. (2) Li, R. Forensic Biology, 2nd ed.; CRC Press: Boca Raton, FL, 2015. (3) Muro, C. K.; Doty, K. C.; Bueno, J.; Halamkova, L.; Lednev, I. K. Anal. Chem. 2015, 87, 306−327. (4) Doty, K. C.; Muro, C. K.; Bueno, J.; Halámková, L.; Lednev, I. K. J. Raman Spectrosc. 2016, 47, 39−50. (5) Elkins, K. M. J. Forensic Sci. 2011, 56, 1580−1587. (6) Orphanou, C. M. Forensic Sci. Int. 2015, 252, e10−e16. (7) Virkler, K.; Lednev, I. K. Forensic Sci. Int. 2008, 181, e1−e5. (8) Sikirzhytski, V.; Virkler, K.; Lednev, I. K. Sensors 2010, 10, 2869− 2884. (9) Virkler, K.; Lednev, I. K. Forensic Sci. Int. 2009, 193, 56−62. (10) Virkler, K.; Lednev, I. K. Analyst 2010, 135, 512−517. (11) Virkler, K.; Lednev, I. K. Anal. Bioanal. Chem. 2010, 396, 525− 534. (12) McLaughlin, G.; Doty, K. C.; Lednev, I. K. Anal. Chem. 2014, 86, 11628−11633. (13) Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K. J. Biophotonics 2014, 7, 59−67. (14) Mistek, E.; Lednev, I. K. Anal. Bioanal. Chem. 2015, 407, 7435− 7442. (15) McLaughlin, G.; Sikirzhytski, V.; Lednev, I. K. Forensic Sci. Int. 2013, 231, 157−166. (16) McLaughlin, G.; Lednev, I. K. J. Forensic Sci. 2015, 60, 595−604. (17) Zapata, F.; de la Ossa, M. A.; Garcia-Ruiz, C. Appl. Spectrosc. 2016, 70, 654−665. (18) Quinn, A. A.; Elkins, K. M. J. Forensic Sci. 2017, 62, 197−204. (19) Feine, I.; Gafny, R.; Pinkas, I. Forensic Sci. Int. 2017, 270, 241− 247. (20) Cohen, J. Educ. Psychol. Meas. 1960, 20, 37−46. (21) Brereton, R. G.; Lloyd, G. R. J. Chemom. 2014, 28, 213−225. (22) Barker, M.; Rayens, W. J. Chemom. 2003, 17, 166−173. (23) Fisher, R. A. Annals of Eugenics 1936, 7, 179−188. (24) Thumanu, K.; Sangrajrang, S.; Khuhaprema, T.; Kalalak, A.; Tanthanuch, W.; Pongpiachan, S.; Heraud, P. J. Biophotonics 2014, 7, 222−231. (25) Mordechai, S.; Shufan, E.; Porat Katz, B. S.; Salman, A. Analyst 2017, 142, 1276−1284. 9804

DOI: 10.1021/acs.analchem.7b01756 Anal. Chem. 2017, 89, 9797−9804