HHMI Logo


Protein identification by mass spectrometry

Identification of proteins by means of mass spectrometry can be done in several ways. If the protein of interest is in a searchable database, the technique of peptide mass fingerprinting (PMF) (1) may be used. The software in this method the software tries to match the accurately measured peptide masses from a proteolytic digest of the protein against the theoretically expected peptide masses calculated from the known primary sequences of the proteins in its database. Most commonly, trypsin is used to digest the protein, in solution or after protein separation by one- or two-dimensional gel electrophoresis.

If PMF is unsuccessful or unconvincing, one or more individual peptides can be picked out of a digest mixture and fragmented by means of MS/MS using collision-induced dissociation (CID) (2) to produce fragment ions that are characteristic of the primary sequence. Proteins in databases can usually be identified directly from the accurate mass of the peptide selected and the measured masses of its fragments, with minimal or no manual intervention. The search programs that carry out this operation do so by attempting to match the experimentally determined masses against the calculated possible fragments based on known rules (3, 4).

In cases where the protein primary sequence is not available, sequencing is usually required. Although close homology can, in some cases, produce an identification by the methods described above, even small differences can cause problems. For example, if an unknown protein is 95% identical to a known one, the probability that a 20 residue peptide from the unknown protein will have at least one substitution compared to the known one is >60%. In this case, both the PMF and MS/MS methods are more likely to fail than to succeed, and therefore a direct sequencing approach is greatly preferable.

The MS/MS instrument used in this laboratory produces so-called high-energy CID spectra (2). No satisfactory software currently exists that will produce a single, high-confidence sequence from most high-energy CID spectra; the most effective method at present is manual interpretation (5). High-energy fragmentation has two principal advantages: nearly all peptides give good CID spectra, and one can usually (60-70% accuracy) distinguish between leucine and isoleucine (6). High energy CID with manual interpretation is the method we use to obtain peptide sequences that are described as “de novo”.

References

  1. Dainese, P. and James, P., “Protein identification by peptide mass fingerprinting,”, in James, P. , Ed., “Proteome Research: Mass Spectrometry,” Springer-Verlag, Berlin, 2001, pp.103-123.
  2. Medzihradszky, K.F. and Burlingame, A.L., “The advantages and versatility of a high-energy collision-induced dissociation-based strategy for the sequence and structural determination of proteins,” Methods: A companion to Methods in Enzymology (1994) 6, 284-303.
  3. Yates III, J.R. “ Database searching using mass spectrometry data,” Electrophoresis (1998) 19, 893-900.
  4. Protein Prospector, http://prospector.ucsf.edu/
  5. Papayannopoulos, I.A., “The interpretation of collision-induced dissociation tandem mass spectra of peptides,” Mass Spectrom. Rev. (1995) 14, 49-73.
  6. Johnson, R.S., Martin, S.A., Biemann, K., Stults, J.T., and Watson, J.T., “Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: differentiation of leucine and isoleucine,” Anal. Chem. (1987) 59, 2621-2625.