Chapter One
Mass Spectrometry: The Foundation of Proteomics
Timothy D. Veenstra Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland
1.1 INTRODUCTION
Scientific direction can be driven by many factors. Obviously, science is still primarily hypothesis driven; however, the continuing technology developments have enabled a greater focus on discovery driven science. Hypothesis driven science formulates a question and then uses whatever technology is available to acquire the information necessary to answer that question. In contrast, discovery driven science collects the information first and then determines the questions (or answers) that can be formulated from the available data. While it may seem to function through a "shot-in-the-dark" mentality, present technological developments make discovery approaches quite logical. Never before in the history of science has there been the capacity to acquire the wealth of data on biological molecules as exists today. A great example of this data gathering capability is reflected within the human genome project. It was inconceivable two decades ago that sequencing of the entire human genome could be accomplished; yet here we are today with the capability of sequencing genomes of other organisms as a routine procedure. Fortunately, science was not content with being able to sequence genomes and soon after the capability to obtain global measurements on the relative abundances of gene transcripts was established. This capability has naturally progressed to the development of technologies to perform discovery driven studies on entire proteomes. This stage does not even represent the end of development, as significant progress is being made in metabolomics.
The term proteomics has evolved over the past few years to almost replace what was once referred to as protein chemistry. The original, and still classical, connotation of proteomics, however, is the characterization of the complete set of proteins encoded by the genome of a given organism (Wilkins et al., 1996). In the early history of proteomics, proteins were fractionated by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) followed by visualization using protein stains such as Coomassie or silver stain (O''Farrell, 1975). To identify differences in the protein abundances of two distinct samples, each of their proteomes is fractionated and visualized on separate gels and those spots that reveal differences in their staining intensity are cored from the gel and identified, typically using mass spectrometry (MS). While it has been around for decades, the ability to use MS to characterize proteins has been the single largest force that has propelled proteomics. Many different facets of MS have led to its prominent position within the field of proteomics. The sensitivity of MS allows for the routine identification of proteins in the femtomole (fmol, [10.sup.-15] mol) to high attomole (amol, [10.sup.-18] mol) range (Moyer et al., 2003). The ability to identify proteins with confidence is aided by the mass measurement accuracy available using current MS technology. This accuracy is typically less than 50 parts per million (ppm) and is often routinely less than 5ppm (Pasa-Tolic et al., 2004). The ability of tandem MS (MS/MS) to obtain partial sequence information in combination with on-line fractionation enables the confident identification of complex mixtures of peptides (Nesvizhskii and Aebersold, 2004). The throughput by which proteins can be identified by MS is unparalleled by any other biophysical technique-a critical parameter in the use of any technology to gather large datasets.
While used to characterize proteins, in reality it is peptides that MS is most adept at identifying. In a great majority of proteomics studies, the complex mixture of proteins is made even more complex by digesting the proteins into smaller peptides prior to MS analysis (Rappsilber and Mann, 2002). This digestion step is optimal for two main reasons. First, overall solubility of peptides in solution is much greater than that of intact proteins. Second, even though the mass measurement accuracy of MS is high, it is still not sufficient to confidently identify a protein de novo based solely on its molecular weight. Therefore, proteins are typically identified through peptides acting as surrogates for their parent protein of origin. One of the most common ways of identifying a protein is based on the mass spectrum of its peptide fragments that are produced by digestion using an enzyme such as trypsin. The resulting spectrum obtained from such a sample is referred to as a "peptide map" or a "peptide fingerprint" (Blackstock and Weir, 1999). To identify the protein, the collection of measured masses is compared to in silico peptide maps derived from a protein or genomic database (Figure 1.1). To identify a single protein within a simple mixture, peptide mapping works very well and it is quite easy to acquire the data necessary for obtaining the desired result. Peptide mapping of proteins within complex mixtures such as cell lysates is not possible since the peptide masses recorded in the mass spectrum will arise from a large number of different species and will not provide a conclusive identification. Fortunately the available instrumentation enables a greater depth of information to be obtained from peptide masses observed by MS. Instead of relying on the accurate mass of a specific peptide, individual peptide ions can be isolated and fragmented by collision induced dissociation (CID). After fragmentation of the peptide, the masses of the fragment ions are recorded and used to obtain partial or complete sequence information, as shown in Figure 1.2. This process is more commonly referred to as tandem MS or MS/MS (Martin et al., 1987). When peptides are subjected to MS/MS, they are not completely obliterated into their constituent amino acids, but instead an ensemble of fragments containing various lengths of the peptide is obtained. This information provides "sequence ladders" that enable partial primary sequence information of the peptide to be deduced. The raw data is then analyzed using software programs that can compare the experimental data to in silico MS/MS mass spectra calculated from the protein sequences in the database (Chamrad et al., 2004).
Proteomics is conducted for many different purposes and at many different levels. Fortunately there are several different types of spectrometers available depending on the focus of the research being conducted. Obviously, if an investigation is focused on identifying simple protein mixtures, the instrument requirements would be different than if entire cell or tissue lysates were the sample of interest. In the following, a description of the various types of MS instrumental platforms available will be discussed with a focus on their application and mode of operation.
1.2 IONIZATION METHODS
The mass spectrometer is made up of two major components: the ionization source and the mass analyzer. It is within the ionization source that the sample of interest is ionized and then desorbed into the gas phase. The mass analyzer acts to guide the gas-phase ions through the instrument to the detector. At the detector, the ions mass-to-charge (m/z) ratios are measured. While sometimes overlooked, many of the developments that have led to MS having a major impact on proteomics have been the invention of new ionization techniques.
The two most common methods to ionize biological molecules prior to their entrance into the analyzer region of the mass spectrometer are matrix-assisted laser desorption ionization (MALDI) (Karas et al., 1987) and electrospray ionization (ESI) (Fenn et al., 1989). While ESI and MALDI have enabled significantly larger proteins (i.e., greater than several hundred thousand daltons) to be analyzed, their greatest impact still remains in the analysis of peptides generated from proteolytic digests of larger species. One of the more significant advances enabled by ESI was the ability to interface separation methods such as liquid chromatography (LC) with MS. While separations are not discussed in this chapter, MS-based proteomics as it is practiced today would not be possible without the concurrent development of chromatographic and electrophoretic separation techniques.
1.2.1 Electrospray Ionization
ESI greatly enhanced the ability to characterize proteins and peptides by MS. Malcolm Dole, who conceived of using an electrospray process to produce intact high mass polymeric ions, provided the first description of ESI. He gained this insight from his knowledge of electrospraying automobile paint (Dole et al., 1968). These first experiments provided the basis of further studies by John Fenn (Fenn et al., 1989), who extended the use of ESI to measure biological molecules and was awarded the Nobel Prize in chemistry in 2002 for his discoveries.
The mechanism by which ESI works is relatively simple. ESI requires the sample of interest to be in solution so that it may flow into the ionization source region of the spectrometer (Figure 1.3A). Particulates or other insoluble entities in the sample will hamper ionization and cause the capillary through which the sample flows to become clogged. To ionize the sample, high voltage is applied to a stainless steel or other conductively coated needle through which the sample is flowing. The voltage results in charges being added to the sample, creating an ion that can be guided through the analyzer region of the instrument. The applied voltage can result in the sample becoming positively or negatively charged; however, positive ionization is used primarily in the analysis of proteins and peptides. As it exits the spray tip, the solution produces submicrometer-sized droplets containing both solute and analyte ions. The sample is then desorbed of solute prior to entering the analyzer region of the instrument. This desorption is achieved by evaporation of the solvent by passing the sample through a heated capillary or a curtain of drying gas, typically nitrogen. Since the desolvation of the ions occurs at atmospheric pressure and the mass analyzer region of the spectrometer is maintained at a lower pressure, the ions are drawn into the spectrometer based on this pressure differential.
What distinguished ESI from other ionization methods is its ability to produce multiply charged ions from large biological molecules. The number of charges that can be accepted by a particular molecule is dependent on many factors including its basicity and size. Depending on their size and the number of basic residues within, peptides typically exist as either singly, doubly, or triply charged ions. Since trypsin is the most commonly used protease in proteomics today, peptides are typically observed in both 1+ and 2+ charged states owing to the basic sites on the N terminus and the C-terminal lysine or arginine residues.
1.2.2 Matrix-Assisted Laser Desorption Ionization
Matrix-assisted laser desorption ionization (MALDI) is another "soft" ionization process that generates ions by irradiating a solid mixture with a pulsed laser beam. The solid mixture is comprised of the analyte of interest dissolved in an organic matrix compound. The laser pulse both indirectly ionizes and desorbs the analyte molecules from the solid mixture. A short-pulse (2-200Hz) ultraviolet (UV) laser is typically used in MALDI; however, infrared irradiation has also been used (Tanaka et al., 1988; von Seggern et al., 2003). To prepare the solid mixture, an equal volume of the sample solution is combined with a saturated solution of matrix prepared in a solvent such as water, acetonitrile, acetone, or tetrahydrofuran. The matrix is a small, highly conjugated organic molecule (i.e., [alpha]-cyano-4-hydroxycinnamic acid (CHCA), 2,5-dihydroxybenzoic acid (DHB), and 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid)) that strongly absorbs energy in the (UV) region. A few microliters of the solid mixture is placed onto a MALDI target plate and allowed to dry. This drying procedure results in the incorporation of the peptides into a crystal lattice. The MALDI target plate is then inserted into the source region of the mass spectrometer followed by laser irradiation, as shown in Figure 1.3B. The MALDI source region of most spectrometers is maintained at a relatively high pressure, causing the ions to be drawn into the mass analyzer region of the instrument, which is maintained at a lower pressure. A recent development has been the design of MALDI sources that operate at atmospheric pressure (Moyer and Cotter, 2002). This ability to operate at atmospheric pressure enables MALDI sources to be interfaced to analyzers, such as ion traps and quadrupole time-of-flight analyzers. Such instruments have historically been interfaced with ESI sources.
Similar to ESI, MALDI can produce both positive and negative ions. Positive ions, which are typically the species of interest in peptide analysis, are formed by the acceptance of a proton as the analyte leaves the matrix. While yet to be absolutely determined, the prevailing theory is that analyte ionization occurs within the dense gas cloud that forms and expands supersonically into the vacuum region of the spectrometer. The analytes are protonated (or deprotonated) through collisions between analyte neutrals, excited matrix ions, and protons and cations. In MALDI, most analytes accept a single protein; therefore peptide and large biomolecular ions are singly charged. This singly charged character results in some molecules having large m/z values and therefore MALDI is typically interfaced with mass analyzers with large m/z ranges, such as time-of-flight (TOF) spectrometers.
1.2.3 Desorption Electrospray Ionization
While not yet applied to proteomic technology, a new method of desorption ionization has recently been described that allows the direct analysis of surfaces by MS. This ionization technique, called desorption electrospray ionization (DESI), was developed in the laboratory of R. Graham Cooks (Takats et al., 2004) and is illustrated in Figure 1.4. In this technique, electrosprayed droplets are directed toward a surface to be analyzed. The droplets produce gaseous ions of the material on the surface and these ions are sampled with a mass analyzer. The mass analyzer is equipped with an atmospheric interface connected via a flexible and extended ion transfer line. This ionization technique, while extremely new, has shown the capability of analyzing a range of compounds from nonpolar small molecules to polar peptides and proteins.
While the most fruitful uses of this new ionization technology are not clear, some of the demonstrated applications suggested a new exciting way to monitor things like drug distribution and surface analysis. In a novel experiment, 10mg of loratadine, an over-the-counter antihistamine, was given to a patient and 40 minutes later DESI was able to detect the molecule directly from the skin surface and saliva of the individual. While the proteomic applications using this technology have not been clearly demonstrated, the potential exists for direct monitoring of proteins on the surface of cells in culture from tissue sections.
(Continues...)
Excerpted from Proteomics for Biological Discovery Copyright © 2006 by John Wiley & Sons, Inc.. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.