Identification of proteins by means of mass spectrometry can be done in several ways. If the protein of interest is in a searchable
database, the technique of peptide mass fingerprinting (PMF) (1) may be used. The software in this method the software tries to match the
accurately measured peptide masses from a proteolytic digest of the protein against the theoretically expected peptide masses calculated from the
known primary sequences of the proteins in its database. Most commonly, trypsin is used to digest the protein, in solution or after protein
separation by one- or two-dimensional gel electrophoresis.
If PMF is unsuccessful or unconvincing, one or more individual peptides can be picked out of a digest mixture and fragmented by means of
MS/MS using collision-induced dissociation (CID) (2) to produce fragment ions that are characteristic of the primary sequence. Proteins in
databases can usually be identified directly from the accurate mass of the peptide selected and the measured masses of its fragments, with
minimal or no manual intervention. The search programs that carry out this operation do so by attempting to match the experimentally determined
masses against the calculated possible fragments based on known rules (3, 4).
In cases where the protein primary sequence is not available, sequencing is usually required. Although close homology can, in some cases,
produce an identification by the methods described above, even small differences can cause problems. For example, if an unknown protein is 95%
identical to a known one, the probability that a 20 residue peptide from the unknown protein will have at least one substitution compared to the
known one is >60%. In this case, both the PMF and MS/MS methods are more likely to fail than to succeed, and therefore a direct sequencing
approach is greatly preferable.
The MS/MS instrument used in this laboratory produces so-called high-energy CID spectra (2). No satisfactory software currently exists that will produce a single,
high-confidence sequence from most high-energy CID spectra; the most effective method at present is manual interpretation (5). High-energy fragmentation has two principal
advantages: nearly all peptides give good CID spectra, and one can usually (60-70% accuracy) distinguish between leucine and isoleucine (6). High energy CID with manual
interpretation is the method we use to obtain peptide sequences that are described as “de novo”.
References
Dainese, P. and James, P., “Protein identification by peptide mass fingerprinting,”, in James, P. , Ed., “Proteome Research: Mass
Spectrometry,” Springer-Verlag, Berlin, 2001, pp.103-123.
Medzihradszky, K.F. and Burlingame, A.L., “The advantages and versatility of a high-energy collision-induced dissociation-based strategy for
the sequence and structural determination of proteins,” Methods: A companion to Methods in Enzymology (1994) 6, 284-303.
Yates III, J.R. “ Database searching using mass spectrometry data,” Electrophoresis (1998) 19, 893-900.
Papayannopoulos, I.A., “The interpretation of collision-induced dissociation tandem mass spectra of peptides,” Mass Spectrom. Rev. (1995) 14, 49-73.
Johnson, R.S., Martin, S.A., Biemann, K., Stults, J.T., and Watson, J.T., “Novel fragmentation process of peptides by collision-induced decomposition in a tandem
mass spectrometer: differentiation of leucine and isoleucine,” Anal. Chem. (1987) 59, 2621-2625.