Clinical Proteome Informatics Workbench Detects Pathogenic

Clinical Proteome Informatics Workbench Detects Pathogenic...

1 downloads 69 Views 1MB Size

Article pubs.acs.org/jpr

Clinical Proteome Informatics Workbench Detects Pathogenic Mutations in Hereditary Amyloidoses Surendra Dasari,*,† Jason D. Theis,‡ Julie A. Vrana,‡ Roman M. Zenka,§ Michael T. Zimmermann,† Jean-Pierre A. Kocher,† W. Edward Highsmith, Jr.,∥ Paul J. Kurtin,‡ and Ahmet Dogan‡,⊥ †

Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Mayo Proteomics Core, and Department of Molecular Genetics, Mayo Clinic, Rochester 55905, Minnesota, United States

∥

S Supporting Information *

ABSTRACT: Shotgun proteomics of hereditary amyloid deposits generates all the information necessary to identify pathogenic mutant peptides and proteins. However, these mutant peptides are invisible to traditional database search strategies. We developed a two-pronged informatics workﬂow for detecting both known and novel amyloidogenic mutations from clinical proteomics data sets. We implemented the workﬂow in a CAP/CLIA certiﬁed clinical laboratory dedicated for proteomic subtyping of amyloid deposits extracted from formalin-ﬁxed paraﬃn-embedded specimens. Performance of the workﬂow was characterized on a validation cohort of 49 hereditary amyloid samples, with conﬁrmed mutations, and 85 controls. The sensitivity, speciﬁcity, positive predictive value, and negative predictive value of the known mutation detection workﬂow were determined to be 92%, 100%, 100%, and 96%, respectively. For novel mutation detection workﬂow, these performance parameters were 82%, 99%, 99%, and 90%, respectively. Validated workﬂow was applied to detect amyloidogenic mutations from a clinical cohort of 150 amyloid samples. The known mutation detection workﬂow detected rare frame shift mutations in apolipoprotein A1 and ﬁbrinogen alpha amyloid deposits. The novel mutation detection workﬂow uncovered unanticipated mutations (W22G and C71Y) of the serum amyloid A4 protein present in patient amyloid deposits. In summary, clinical amyloid proteomics data sets contain mutant peptides of clinical signiﬁcance that are recoverable with improved bioinformatics. KEYWORDS: bioinformatics, amyloidosis, mutations, proteomics, clinical specimens

■

INTRODUCTION Amyloidosis refers to a complex spectrum of hereditary and acquired diseases that are characterized by abnormal extracellular deposition of misfolded proteins in various organs. A single amyloidogenic protein1,2 present in the deposit is the determining factor of its subtype and the associated disease phenotype. Traditionally, amyloid diagnosis and subtyping is performed in two steps. For diagnosis, the tissue specimen is stained with a chemical dye, Congo red (CR), which is taken up by the unique physical structure of amyloid plaques (βpleated sheet). This makes the amyloid deposits appear reddish-brown and produce apple-green birefringence under polarized light. Next, the subtype is inferred via immunohistochemistry (IHC), which resembles a guided search wherein clinical presentation is used to infer potential subtypes, and clinical surrogates and IHC results are used to ﬁnalize the subtype. However, IHC often produces ambivalent results because of the background serum contamination, epitope loss from the formalin ﬁxation process, and lack of speciﬁc antibodies for all of the subtypes. This creates diagnostic gray zones hampering patient care. To remedy this, laboratory scientists have developed several shotgun proteomics methods for subtyping amyloid deposits obtained from formalin-ﬁxed paraﬃn-embedded (FFPE) tissues and fat aspirates.3−6 When technical idiosyncrasies are ignored, © 2014 American Chemical Society

all of these methods operate within a singular framework and share a common bioinformatics problem. They start by isolating proteins from the amyloid deposits, extracted proteins are digested with trypsin, and the resulting peptides are analyzed via liquid chromatography tandem mass spectrometry (LC−MS/MS). Next, bioinformatics pipelines are leveraged to match the MS/MS spectra against a canonical protein sequence database using database search engines such as Sequest.7 Resulting peptide identiﬁcations are ﬁltered and assembled into protein identiﬁcations using postprocessing software such as Scaﬀold.8 This type of informatics approach works well for identifying known amyloidogenic proteins2 but fails to detect the causative amino acid sequence variations in hereditary amyloidoses that are important for optimal patient management and appropriate genetic counseling. In this study, we describe a two-pronged proteome informatics workﬂow for detecting known and novel amino acid mutations from clinical amyloid shotgun proteomics data sets. Known mutations are detected by matching the MS/MS against a custom protein sequence database, augmented with amyloidogenic mutations, using a traditional database search strategy. This part of the workﬂow was implemented in a CAP/ Received: November 25, 2013 Published: March 20, 2014 2352

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

Article

light and dissected by laser microdissection. Multiple (2−4) independent microdissections, each encompassing an area of 60,000 μm2, were performed for each case. FFPE fragments from each microdissection were collected in a cap containing cell lysis buﬀer and analyzed individually. Proteins were extracted from the fragments using heat and denatured via sonication. Extracted proteins were digested with trypsin, and 5 μL of the resulting peptides mixture was analyzed on a LTQOrbitrap XL mass spectrometer (Thermo-Fisher, Waltham, MA) connected to an Eskigent (AB Sciex, Dublin, CA) liquid chromatography (LC) system. A total of approximately 6.6 million MS/MS spectra were collected from all LC−MS/MS analyses. Binary spectral data present in the raw ﬁles were transcoded to either MGF format using the extract_msn software or mzML format using the msConvert tool of the ProteoWizard library.10

CLIA certiﬁed clinical testing laboratory at the Mayo Clinic. Novel mutations are detected by matching the MS/MS against wild type protein sequences using a sequence tagging search strategy conﬁgured to look for unanticipated mutations. The workbench is integrated into the SWiFT9 data processing environment and can take advantage of multinode computer clusters. Application of the workbench on clinical amyloid samples revealed 39 diﬀerent amyloidogenic mutations in six diﬀerent genes of patients with various types of hereditary amyloidosis.

■

MATERIALS AND METHODS

Study Subjects

The study was approved by the Mayo Clinic Institutional Review Board. Table 1 presents the demographics and

Bioinformatics

Table 1. Demographics of Patients Who Participated in This Study amyloid typea ATTR AApoA1 AApoA4 AGel controls ATTR AApoA1 AApoA4 AGel SAA4 AFib

no. of cases (M/F/U)b Validation Cohort 41 (33/8/0) 6 (2/1/3) 1 (1/0/0) 1 (1/0/0) 85 (54/31/0) Clinical Cohort 114 (88/25/1) 14 (5/8/1) 4 (2/2/0) 3 (3/0/0) 2 (2/0/0) 13 (9/2/2)

Figure 1 illustrates the two-pronged informatics workﬂow we developed for detecting known and novel mutations from

age (years)

c

63.5 62.0 50.0 68.0 69.5

± ± ± ± ±

11.0 8.1 0.0 0.0 10.8

63.7 59.9 72.0 63.0 76.0 62.8

± ± ± ± ± ±

14.2 13.2 18.0 8.8 7.0 10.2

a

All patients have hereditary amyloidosis, except controls. ATTR stands for transthyretin amyloidosis, AApoA1 stands for apolipoprotein A1 amyloidosis, AApoA4 stands for apolipoprotein A4 amyloidosis, AGel stands for gelsolin amyloidosis, AFib stands for ﬁbrinogen amyloidosis, and SAA4 stands for serum amyloid A4 amyloidosis. bM stands for male, F stands for female, and U stands for unknown. cAmyloidogenic mutations were validated by Sanger sequencing the corresponding gene. These mutations are summarized in Table 2.

Figure 1. Informatics workﬂow for detecting amyloidogenic mutations. Right prong of the workﬂow detects known amyloidogenic mutations using augmented protein sequence databases. Left prong of the workﬂow detects novel mutations using wild type protein sequences. Sequest, Mascot, and X!Tandem are database search engines. DirecTag derives sequence tags of three amino acids in length from the tandem mass spectra (MS/MS). TagRecon reconciles the inferred tags against protein sequences while making allowances for unanticipated mutations. Scaﬀold and IDPicker ﬁlter the peptide identiﬁcation results.

characteristics of the study subjects. Overall, the study has a validation cohort (N = 134) and a clinical cohort (N = 150). The validation cohort contains a total of 49 subjects with four diﬀerent types of hereditary amyloidosis (Table 1). The presence of the pathogenic mutation in the amyloid protein of each subject was conﬁrmed with Sanger sequencing. We also included a total of 85 negative control subjects in the validation cohort whose TTR genes were wild-type by Sanger sequencing. The performance characteristics of the mutation detection workbench were thoroughly evaluated using the validation cohort. The pipeline was later applied to identify amyloidogenic mutations from subjects in the clinical cohort, which contains a total of 150 subjects with six diﬀerent types of hereditary amyloidosis (Table 1).

amyloid proteomics data sets. Known amyloidogenic mutations are incorporated into a sequence database and detected with a traditional database search strategy. Novel mutations are identiﬁed with error-tolerant sequence tag-based searches conﬁgured to use wild type protein sequences as reference. Detected mutations are validated using Sanger sequencing. Validated known amyloidogenic mutations are reported to the clinician. Validated novel mutations in amyloidogenic proteins are not clinically reported but are held for clinical evidence accumulation before feeding them back into the known amyloidogenic mutation detection workﬂow. A complete list of the search engine settings and protein assembly parameters utilized in this study is presented in Supplemental File 1.

Shotgun Proteomics of Amyloid Deposits

We utilized a previously published method to isolate the amyloid deposits from the FFPE tissue biopsies and subject them to tandem mass spectrometry (MS/MS).6 In brief, 10-μm thick sections of FFPE tissues were deparaﬀanized and stained with CR. CR-positive areas were identiﬁed using ﬂuorescent

Amyloid Variant Database Preparation

Traditional protein sequence databases such as the SwissProt contain wild type sequences that are suﬃcient to identify an 2353

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

Article

sequence tags from all MS/MS spectra present in a raw ﬁle. The software was conﬁgured to retain the best 50 tags of three amino acids in length for each spectrum. TagRecon16 reconciled the inferred sequence tags against a composite protein sequence database containing SwissProt’s complete human proteome and common contaminants. Decoy protein sequences were also searched to estimate peptide identiﬁcation FDRs. The software was conﬁgured to derive semitryptic peptides from the sequence database and look for single point mutations as well as the following variable modiﬁcations: oxidation of methionine (+15.996 Da) and formation of Nterminal pyroglutamic acid (−17.023 Da). IDPicker17,18 ﬁltered the peptide identiﬁcations at a stringent 2% FDR using an optimal combination of MVH, mzFidelity, and XCorr scores. Peptides were assembled into proteins following parsimony rules. Protein identiﬁcations with at least two independent peptide identiﬁcations were retained for clinical interpretation. For every case, the diagnostic amyloid protein was identiﬁed following the protocol described above. Detected mutant peptides were attested following strict criteria described elsewhere.16 The most abundant novel mutation detected in the diagnostic amyloidogenic protein with at least ﬁve spectral counts was validated using Sanger sequencing and retained for inclusion in the amyloidogenic mutation knowledgebase. The software utilized in this workﬂow is available for download, free of charge, from the Web site: http://fenchurch.mc.vanderbilt. edu/software.php.

amyloidogenic protein. However, these databases do not contain representative entries for amyloidogenic mutations. To remedy this, we compiled a list of known amyloidogenic mutations (including frame shifts) from a variety of data sources such as SwissVar,11 MSV3D,12 and published literature. This knowledgebase was manually curated to remove low penetrance variants, resulting in the retention of 456 mutations in 24 diﬀerent proteins. We also included polymorphisms and other pathogenic variants found in the 24 amyloidogenic proteins in the ﬁnal knowledgebase. This prevents forceful misinterpretation of a non-amyloidogenic mutant peptide MS/ MS as an amyloidogenic mutant MS/MS. Mutant peptides were generated and appended to a composite database containing sequences of common contaminant proteins and the SwissProt’s complete human proteome. Reversed sequence entries were appended to the ﬁnal database for estimating false discovery rates (FDRs) of the peptide identiﬁcations. Supplemental File 2 lists all of the variants and their corresponding peptide sequences. These additional mutant peptide sequences serve as hook for the protein identiﬁcation software to pull known amyloidogenic mutations from patient amyloid deposits. Detecting Known Mutants

MS/MS spectra present in each microdissection’s raw ﬁle were identiﬁed with three diﬀerent database search engines: Sequest,7 X!Tandem,13 and Mascot.14 All search engines were conﬁgured to derive fully tryptic peptides from the augmented protein sequence database and look for oxidation of methionine (+15.996 Da) as a variable modiﬁcation. X!Tandem automatically searches for the following variable modiﬁcations: formation of N-terminal pyroglutamic acid (−17.023 Da) and water loss from glutamates (−18.01 Da). All peptide identiﬁcations from a patient’s sample were combined and ﬁltered using Scaﬀold software.8 Proteins with at least single peptide identiﬁcation (peptide probability >0.9) were considered for clinical interpretation and validation. For every case, we created a clinical proteomics proﬁle that lists all of the conﬁdent protein identiﬁcations present in each microdissection along with their respective spectral counts. A pathologist called the amyloid subtype by correlating the clinical factors with the most abundant amyloidogenic protein detected across all microdissections. The most abundant known pathogenic mutation detected in the diagnostic amyloidogenic protein with at least ﬁve spectral counts was validated using Sanger sequencing and clinically reported. Here we emphasize that the clinical reporting guidelines require Sanger validation of pathogenic mutations. We took advantage of this orthogonal validation rule to increase the proteomic mutation detection sensitivity by requiring only one highly conﬁdent peptide identiﬁcation in lieu of two. This approach works well for detecting amyloidogenic mutations because their corresponding proteins are often embedded into the amyloid ﬁbrils in a natively degraded form. The FFPE ﬁxation process also degrades the proteins further, making it harder to recover multiple peptide identiﬁcations matching to an amyloidogenic mutation locus.

Sanger Sequencing of Amyloidogenic Genes

Candidate genes were isolated from the subject’s blood and subjected to chain-termination sequencing. All exons of a gene were ampliﬁed using hybrid primers containing 20−22 bases of gene speciﬁc sequence and a universal sequencing primer (UPS) sequence (19 or 23 bases for the forward and reverse primers) at the 5′ end. Ampliﬁed products were sequenced using UPS primers, the ABI Big Dye terminators (Applied Biosystems, Foster City, CA) and capillary electrophoresis on an ABI 3730 sequencer. Data were analyzed using Mutation Surveyor (SoftGenetics, College Station, PA) conﬁgured to use corresponding reference sequences obtained from GenBank. Detected known amyloidogenic mutations were reported to the clinician, whereas novel mutations were held in a knowledgebase of potentially pathogenic mutations. Novel mutations with strong clinical and/or biophysical evidence are fed back into our known amyloidogenic mutation detection workﬂow for future use. Molecular Dynamics (MD) Simulation of Novel Mutations in Amyloidogenic Proteins

We detected two novel mutations (W22G and C71Y) in serum amyloid A4 (SAA4) amyloidosis cases. The eﬀect of these mutations on SAA4’s structure was assessed via MD simulation. Homology models were constructed for SAA4 using the ITASSER server.19 The top scoring model has a 4-helix bundle. We performed implicit solvent MD simulations of this model in three diﬀerent sequence contexts (wild type, W22G, and C71Y). Each simulation was minimized, slowly heated to 270 K, and equilibrated for 1.5 million time steps. The next ﬁve million time steps were analyzed. We chose “time steps” rather than a unit of time because implicit solvent simulations speed up the kinetics. Simulations were performed in NAMD software,20 whereas visualization and trajectory analysis was performed in Visual Molecular Dynamics (VMD) software.21

Novel Mutant Detection

We utilize an error-tolerant search paradigm to detect novel mutations. This method derives short sequence tags from the MS/MS spectra and matches them to wild type protein sequences while making allowances for unanticipated amino acid substitutions. In this workﬂow, DirecTag15 inferred partial 2354

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

■

Article

RESULTS AND DISCUSSION

Table 2. Validation Cohort’s Amyloidogenic Mutation Detection Summarya

Hereditary amyloid deposits are rich in mutant peptides of clinical signiﬁcance. The corresponding shotgun proteomics data sets contain all the information necessary to identify these peptides. However, these mutant peptides are invisible to traditional database searches because canonical protein sequence databases do not contain representative sequence entries. As a result, most of the consequential amyloidogenic mutants present in the patient samples go undetected. To remedy this, we developed a two-pronged informatics approach for detecting both known and novel amyloidogenic mutations (Figure 1). We implemented the known mutation detection workﬂow in a CAP/CLIA clinical testing laboratory at the Mayo Clinic, making it the ﬁrst shotgun proteomics-based mutation detection workﬂow that has been routinely used for patient care. The novel mutation detection workﬂow is utilized for clinical research.

protein, mutation (no. of cases)

enhanced database search

sequence tag search

Hereditary Amyloidosis TTR, T60A (10) 10/10 9/10 TTR, V30M (10) 10/10 10/10 TTR, V122L (5) 5/5 4/5 TTR, S50R (3) 3/3 2/3 TTR, A102S (2) 2/2 2/2 TTR, E54G (2) 2/2 2/2 TTR, T59K (2) 2/2 1/2 TTR, Val122Del*b (1) 1/1 0/1 TTR, A36P (2) 1/2 1/2 TTR, G47V (1) 0/1 1/1 TTR, I84S (1) 1/1 1/1 TTR, L107V (1) 1/1 1/1 TTR, S77Y (1) 0/1 0/1 ApoA1, P27R (2) 1/2 1/2 ApoA1, L99P (2) 2/2 2/2 ApoA1, G50R (1) 1/1 1/1 ApoA1, H179Fs*b (1) 1/1 0/1 ApoA4, N147S (1) 1/1 1/1 Gel, N231D (1) 1/1 1/1 Controls (Senile Amyloidosis of Various Types) TTR, none (85) 0/85 1/85

Detecting Known Amyloidogenic Mutations with High Sensitivity and Speciﬁcity

We assessed the reliability of the known amyloidogenic mutation detection workﬂow in a clinical setting. For this, we employed a validation cohort containing 49 known hereditary amyloidosis patients and 85 controls (Table 1). Amyloidogenic mutations in the patient population were validated by Sanger sequencing the corresponding gene. We also Sanger sequenced the TTR gene of the control subjects in order to rule out the presence of any mutations. These negative controls were critical to assess whether the workﬂow can resist reporting peptide sequence variants in the absence of mutations in the corresponding genes. Amyloid FFPE tissues from both patients and controls were subjected to shotgun proteomics. Resulting MS/MS were matched against a custom protein sequence database augmented with known amyloidogenic mutant peptides, using three diﬀerent database search engines. Scaﬀold software processed the peptide identiﬁcations and assembled them into protein identiﬁcations. The most abundant mutation detected in the patient’s amyloidogenic protein was crossreferenced with the corresponding genetic information. Patient cases with matching amino acid mutation and gene mutation were considered as true positives (TPs). Patient cases with mismatching amino acid mutation and gene mutation were considered as false positives (FPs). Patient cases with no detectable amino acid mutation in the corresponding amyloid protein were considered as false negatives (FNs). Control cases had no mutations in their TTR gene (conﬁrmed by Sanger sequencing). Hence, controls with no detectable TTR amino acid mutation were considered as true negatives (TNs), FPs otherwise. Table 2 summarizes the mutations detected in the validation cohort by enhanced database search-based known mutation detection workﬂow. The workﬂow classiﬁed the 134 validation subjects as 45 TPs, 85 TNs, 4 FNs, and zero FPs. We detected correct mutant peptide sequence for one FN case (G47V in TTR), but the number of spectral matches supporting the mutation was below the threshold for clinical reporting. Two FN mutations (S77Y in TTR and P27R in ApoA1) are conﬁned to short tryptic peptides that are not amenable for detection. Overall, the sensitivity, speciﬁcity, positive predictive value, and negative predictive value of the known mutation detection workﬂow are 92%, 100%, 100%, and 96%, respectively.

a

Mutations in the patient cases were validated with Sanger sequencing. TTR genes of the control subjects were also Sanger sequenced to rule out the presence of mutations. None of the mutations were detected by a traditional database search conﬁgured to use wild type sequences. Enhanced database search uses mutant peptide sequences to detect mutations. In contrast, sequence tag search infers mutations by matching MS/MS against wild type protein sequences. *bDel stands for deletion, Fs stands for frame shift.

We applied the known amyloidogenic mutation detection workﬂow to a clinical cohort containing a total of 150 patients with six diﬀerent types of hereditary amyloidosis (see Table 1 for patient demographics). Table 3 presents the summary of Table 3. Clinical Cohort’s Mutation Summarya amyloid type ATTR

AApoA1 AApoA4 AGel AFib

mutation (no. of cases) V122I(46), T60A(24), V30M(19), P24S(4), S50R(4), T59K(3), G47V(2), A120S(1), D38A(1), E54G(1), E89K(1), F33L(1), F64L(1), I84S(1), I84T(1), L107V(1), L58H(1), R54S(1), S52P(1) L99P(11), E58K(2), P27R(1) N147S(4) N211K(2), A578P(1) E545V(10), R573L(2), F521Fsb(1)

a

Mutations were detected using the known mutation detection workﬂow. bFs stands for frame shift. These mutations were independently conﬁrmed by Sanger sequencing the respective genes obtained from the corresponding patients.

mutations detected by this workﬂow in the clinical cohort. All of these mutations were independently conﬁrmed by Sanger sequencing the respective genes in corresponding patients. Curiously, two serum amyloid A4 (SAA4) amyloidosis cases failed to reveal any mutations when processed with this workﬂow. 2355

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

Article

Sequence Tagging Detects Novel Point Mutations

We wanted to estimate the fraction of potential true positive mutations that are detected by the novel point mutation detection workﬂow. At the same time, we also needed to assess the number of false positives reported by this workﬂow. For this, we conﬁgured the workﬂow to detect mutations present in the validation cohort samples by matching the corresponding MS/MS scans against wild type protein sequences. In this scenario, a high-performance novel mutation workﬂow should maximize the recovery of true amyloidogenic mutations present in the patient samples while minimizing the reporting of peptide sequence variants that have no corresponding gene mutation. For each sample in the validation cohort, DirecTagTagRecon sequence tag software identiﬁed both wild type and mutant peptides. IDPicker ﬁltered the peptide identiﬁcations and assembled them into protein identiﬁcations. Detected mutations were attested, and low-conﬁdent variants were removed from further analysis. For patient samples, the most abundant mutation discovered in the diagnostic amyloid protein was cross-referenced with Sanger sequencing gold standard. For control samples, the most abundant mutation discovered in TTR gene (if any) was compared to the corresponding genetic information. Cross-referenced mutations were classiﬁed as TPs, FPs, TNs, and FNs following the abovedescribed logic. Table 2 summarizes the mutations detected in the validation cohort by the sequence tag-based novel mutation detection workﬂow. This workﬂow classiﬁed the 134 validation subjects as 40 TPs, 1 FP, 9 FNs, and 84 TNs. On the basis of this information, we computed the sensitivity, speciﬁcity, positive predictive value, and negative predictive value of the novel mutation detection workﬂow as 82%, 99%, 98%and 90%, respectively. In our clinical setting, we ﬁrst processed the samples with the known amyloidogenic mutation detection workﬂow. Samples that failed to reveal mutations at this stage were reﬂexed to the novel mutation detection workﬂow. Following this protocol, we applied the DirecTag-TagRecon workﬂow on two very rare cases of serum amyloid A4 (SAA4) amyloidosis and detected a W22G mutation and a C71Y polymorphism in SAA4 protein. The C71Y polymorphism (dbSNP Accession: rs2460827) was conﬁrmed with Sanger sequencing. We could not obtain the patient sample required to conﬁrm the W22G mutation. The impact of these mutations on the structure and function of the SAA4 protein was assessed using MD simulations. Since the 3D structure of the SAA4 protein is unknown, we constructed a homology model using I-TASSER software, and MD simulations were performed in three diﬀerent sequence contexts (wild type, W22G, and C71Y). Figure 2 illustrates the eﬀect of the mutations on the homology model. Both mutations destabilized the helix geometry of the SAA4 protein by signiﬁcantly altering the helical content (Figure 2). We also computed the impact of these mutations on the thermodynamic stability of the protein using I-Mutant 2.0 software.22 This software employs a support vector machine to predict changes in the protein’s thermodynamic stability (Gibbs free energy) with respect to single point mutations. The software was conﬁgured to predict the ΔΔG (ΔGmutant − ΔGwild‑type) using the primary sequence of the protein, due to the lack of Xray crystal structure, at pH 7.0 and temperature of 25 °C. The ΔΔG values for W22G and C71Y mutations in the SAA4 protein were −3.1 and −0.66 kcal/mol, indicating that both of these novel mutations can potentially destabilize the structure

Figure 2. Structural implications of novel mutations detected in SAA4 protein. MD simulations were performed for a homology molecular model. (A) Fraction of residues in an α-helix geometry across the three sequence contexts is shown. C71Y signiﬁcantly increases the helical content of the ensemble, while W22G leads to a minor decrease. (B−D) Snapshots after the same amount of simulation time for the (B) wild-type, (C) C71Y, and (D) W22G structures using the highest scoring homology model. Structures are colored N- to Cterminus from red through white to blue; depth cueing fog is also employed. The two mutated positions are shown as spheres and labeled. Overall, the mutations destabilize the native SAA4 fold. Either a particular loop region is stabilized in a helical position, further kinking the longer helices (C71Y), or this same region becomes unstructured and longer helices become unkinked (W22G).

of SAA4. On the basis of these two pieces of evidence, we reason that the detected SAA4 W22G and C71Y mutations in SAA4 amyloidosis have potential clinical consequences. These two mutations would be prime candidates for collecting additional clinical and biophysical evidence needed to associate them with amyloidosis. We typically do not report these novel mutations to the clinicians unless there is compelling clinical evidence buttressing their pathogenicity. Database Augmentation Is Necessary To Detect Amyloidogenic Frame Shift Mutations

Frame shifted genes such as ﬁbrinogen alpha (FIBA) are known to produce novel forms of proteins, which misfold and accumulate in various organs leading to amyloidosis.23 Amyloid shotgun proteomics generates all the information necessary to identify these frame shifted proteins. However, traditional protein identiﬁcation searches often fail to detect these proteins because sequence databases employed for the search do not contain representative entries for the frame shifted proteins. To remedy this, we collected, from clinical literature, a total of nine amyloidogenic frame shift mutations in APOA1 and FIBA genes. Sequences were created for frame shifted portions of the corresponding genes, annotated appropriately, and appended to the SwissProt’s complete human proteome. Tandem mass spectra (MS/MS) from the amyloid deposits were matched against the augmented sequence database using Sequest, 2356

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

Article

Figure 3. Frame shift mutations in apolipoprotein A1 (APOA1) and ﬁbrinogen alpha (FIBA) chain. Regions with peptide evidence are highlighted in bold letters on yellow background. Red line highlights the novel sequence arising due to a frame shift. FIBA protein was truncated up to the frame shift in order to obtain a compact representation.

Mascot, and X!Tandem software. Scaﬀold software ﬁltered the resulting peptide matches and assembled protein identiﬁcations. Proteins with more than one high conﬁdent peptide identiﬁcation were considered to be present in the amyloid deposit. Frame shifted proteins were accepted into the ﬁnal results only if they produce at least one high conﬁdent peptide identiﬁcation (probability >0.9) from the novel sequence portion of the protein. Figure 3A illustrates protein sequence coverage map of the APOA1 His179 frame shift detected in the validation cohort. One might question the validity of this frame shift because we detected only one conﬁdent peptide identiﬁcation matching to the novel sequence portion of the protein. However, there are only two tryptic peptides in the frame shifted portion of the sequence. Additionally, we also conﬁrmed this patient’s frame shift mutation via Sanger sequencing. Figure 3B illustrates the sequence coverage map for the ﬁbrinogen alpha (FIBA) F512 frame shift detected in the clinical cohort. In contrast to the APOA1 frame shift, we detected multiple peptides matching to the frame shifted portion of the FIBA protein and also a peptide bridging the native and frame shifted portions of the protein, which further raises the conﬁdence in the validity of this frame shift. Technically, the frame shifted peptides detected by proteomics might not need orthogonal Sanger validation if the peptide sequences are idiosyncratic to the mutant protein. However, frame shifted peptides that share sequence homology with other wild type (human or non-human contaminant) proteins must be validated via DNA sequencing. Regardless, our clinical guidelines require Sanger conﬁrmation of all pathogenic mutations prior to clinical reporting.

protein sequences while looking for unanticipated mass shifts in the experimental peptide due to amino acid mutations. This method can recover the mutations present in patient samples without prior knowledge. When both search strategies were conﬁgured to recover mutant peptides from the validation cohort though, the database search-based known mutation detection workﬂow had higher sensitivity (92% vs 82%) and speciﬁcity (100% vs 99%) when compared to the sequence tag search-based novel mutation detection workﬂow. This is expected because the sequence tag search probes a larger search space of potential mutations, which results in a slightly lower sensitivity and speciﬁcity for the method. Hence, mutations detected by the novel mutation detection workﬂow needs to be validated by independent Sanger sequencing before considering them for clinical use. Another caveat is that frame shift mutations and amino acid deletion mutations cannot be detected with the novel mutation detection workﬂow. This is because frame shifted peptide sequences are completely new and cannot be matched to a wild type sequence by allowing for single amino acid substitutions. The deletion mutations can shorten the peptide sequence by more than one amino acid, which is not accounted for by the traditional sequence tagbased mutation detection search engines. Why Use Proteomics Instead of Genomics for Amyloid Subtyping?

Amyloid deposits are pure proteinaceous deposits, and the amyloidogenic protein present in the deposit is often produced elsewhere in the body. The deposits may be caused by abnormal folding of wild-type or mutated proteins. The mutations can be germline (hereditary) as seen in hereditary transthyretin amyloidosis (ATTR) or somatic as seen in immunoglobulin light chain amyloidosis (AL). The presence of a germline mutation in the TTR gene or somatic mutations in the immunoglobulin genes of clonal plasma cell neoplasms does not necessarily indicate that the mutation is pathogenic, and the abnormal protein produced by the mutation is a major constituent of the amyloid deposits. In this context, genetic testing by itself does not necessarily prove causality without phenotypic evidence from proteomics studies. In contrast, the proteomic method we described in this study provides conclusive evidence that the abnormal protein is actually deposited in the amyloid plaques.

Known Mutation vs Novel Mutation Detection Workﬂows

The two workﬂows described in this article employ completely orthogonal search strategies for recovering amyloidogenic mutations from patient samples. The known mutation detection workﬂow employs a database search strategy, which matches the experimental peptide mass spectra (MS/MS) against an augmented protein sequence database containing peptide sequence entries for all known amyloidogenic mutations. This strategy fails when the patient mutation has no sequence representation in the database. In contrast, the novel mutation detection workﬂow uses a sequence tag-based search strategy. This method derives short sequence tags from the MS/MS, and the tags are reconciled against the wild type 2357

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Journal of Proteome Research

■

Article

(6) Vrana, J. A.; Gamez, J. D.; Madden, B. J.; Theis, J. D.; Bergen, H. R., 3rd; Dogan, A. Classification of amyloidosis by laser microdissection and mass spectrometry-based proteomic analysis in clinical biopsy specimens. Blood 2009, 114 (24), 4957−9. (7) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976−989. (8) Searle, B. C. Scaffold: a bioinformatic tool for validating MS/MSbased proteomic studies. Proteomics 2010, 10 (6), 1265−9. (9) Zenka, R. M., Johnson, K. L., Bergen, H. R. Exploring Proteomics Metadata Using Spotﬁre and a Companion User Interface; American Society of Mass Spectrometry: Salt Lake City, 2011. (10) Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.; Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M. Y.; Paulse, C.; Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.; Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.; Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.; Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.; Mallick, P. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012, 30 (10), 918−20. (11) Mottaz, A.; David, F. P.; Veuthey, A. L.; Yip, Y. L. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 2010, 26 (6), 851−2. (12) Luu, T. D.; Rusu, A. M.; Walter, V.; Ripp, R.; Moulinier, L.; Muller, J.; Toursel, T.; Thompson, J. D.; Poch, O.; Nguyen, H. MSV3d: database of human MisSense Variants mapped to 3D protein structure. Database 2012, 2012, bas018. (13) Fenyo, D.; Beavis, R. C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 2003, 75 (4), 768−74. (14) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551−67. (15) Tabb, D. L.; Ma, Z. Q.; Martin, D. B.; Ham, A. J.; Chambers, M. C. DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res. 2008, 7 (9), 3838−46. (16) Dasari, S.; Chambers, M. C.; Slebos, R. J.; Zimmerman, L. J.; Ham, A. J.; Tabb, D. L. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 2010, 9 (4), 1716−26. (17) Ma, Z. Q.; Dasari, S.; Chambers, M. C.; Litton, M. D.; Sobecki, S. M.; Zimmerman, L. J.; Halvey, P. J.; Schilling, B.; Drake, P. M.; Gibson, B. W.; Tabb, D. L. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res. 2009, 8 (8), 3872−81. (18) Zhang, B.; Chambers, M. C.; Tabb, D. L. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res. 2007, 6 (9), 3549−57. (19) Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinf. 2008, 9, 40. (20) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26 (16), 1781−802. (21) Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33−8 27−8. (22) Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005, 33 (Web Server issue), W306−10. (23) Hamidi Asl, L.; Liepnieks, J. J.; Uemichi, T.; Rebibou, J. M.; Justrabo, E.; Droz, D.; Mousson, C.; Chalopin, J. M.; Benson, M. D.; Delpech, M.; Grateau, G. Renal amyloidosis with a frame shift mutation in fibrinogen aalpha-chain gene producing a novel amyloid protein. Blood 1997, 90 (12), 4799−805.

CONCLUSION In 1994, Mann and Wilm described an error-tolerant method for detecting amino acid mutations from tandem mass spectra. In the ensuing two decades, numerous approaches have been developed for improving the detection of mutant peptides from shotgun proteomics data sets. However, most of these methods have been conﬁned to the research and have never been translated into actual patient care. In this work, we describe a two-pronged informatics workﬂow for detecting known and novel mutations present in amyloidogenic proteins. The database search-based known mutation detection workﬂow was implemented in a CAP/CLIA clinical testing laboratory for routine use in patient care. The sequence tag-based novel mutation detection workﬂow was implemented in a clinical research setting for detecting novel amyloidogenic mutations. Even though plenty of work needs to be done to make the shotgun proteomics-based mutation detection routine in a clinical laboratory, we believe that our implementation of the workﬂow to detect amyloidogenic mutations is a step in the right direction.

■

ASSOCIATED CONTENT

S Supporting Information *

Parameters used for all the search engines, peptide ﬁltering software, and protein assembly software; peptide sequence entries of the amyloid mutation database. This material is available free of charge via the Internet at http://pubs.acs.org.

■

AUTHOR INFORMATION

Corresponding Author

*Tel: 507-284-0513. Fax: 507-284-0360. E-mail: Dasari. [email protected]. Present Address ⊥

Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY. Notes

The authors declare no competing ﬁnancial interest.

■

ACKNOWLEDGMENTS S. Dasari was supported by the Center for Individualized Medicine at the Mayo Clinic and the Department of Laboratory Medicine and Pathology (DLMP), Mayo Clinic. J.D.T., J.A.V., P.J.K., and A.D. were supported by the DLMP, Mayo Clinic.

■

REFERENCES

(1) Sipe, J. D.; Cohen, A. S. Review: history of the amyloid fibril. J. Struct. Biol. 2000, 130 (2−3), 88−98. (2) Sipe, J. D.; Benson, M. D.; Buxbaum, J. N.; Ikeda, S.; Merlini, G.; Saraiva, M. J.; Westermark, P. Amyloid fibril protein nomenclature: 2010 recommendations from the nomenclature committee of the International Society of Amyloidosis. Amyloid 2010, 17 (3−4), 101−4. (3) Brambilla, F.; Lavatelli, F.; Di Silvestre, D.; Valentini, V.; Rossi, R.; Palladini, G.; Obici, L.; Verga, L.; Mauri, P.; Merlini, G. Reliable typing of systemic amyloidoses through proteomic analysis of subcutaneous adipose tissue. Blood 2012, 119 (8), 1844−7. (4) Liao, L.; Cheng, D.; Wang, J.; Duong, D. M.; Losik, T. G.; Gearing, M.; Rees, H. D.; Lah, J. J.; Levey, A. I.; Peng, J. Proteomic characterization of postmortem amyloid plaques isolated by laser capture microdissection. J. Biol. Chem. 2004, 279 (35), 37061−8. (5) Murphy, C. L.; Wang, S.; Williams, T.; Weiss, D. T.; Solomon, A. Characterization of systemic amyloid deposits by mass spectrometry. Methods Enzymol. 2006, 412, 48−62. 2358

dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358

Clinical Proteome Informatics Workbench Detects Pathogenic

Recommend Documents