Related Topics

Peptide Mass Fingerprint

A mass spectrum of the peptide mixture resulting from the digestion of a protein by an enzyme provides a fingerprint of great specificity. So specific, that it is often possible to identify the protein from this information alone.

This method of identification is much more reliable than using fingerprints based on PAGE migration patterns or HPLC retention times. However, peptide mass fingerprinting is limited to the identification of proteins for which sequences are already known, it is not a method of structural elucidation.

Search Parameter Notes

Enzyme

An enzyme of low specificity, which digests proteins to a mixture of very short peptides, is not a good choice, because almost any given 3 or 4 residue peptide will be found in many database entries. The longer the peptide, the greater the specificity. A further consideration for MALDI analysis is that the low mass region, below ~500 Da, is obscured by the presence of matrix peaks.

In general, it is best to use enzymes of specificity equal to or greater than trypsin.

Setting the number of allowed missed cleavage sites to zero simulates a limit digest. If you are confident that your digest is perfect, with no partial fragments present, this will give maximum discrimination and the highest score.
If experience shows that your digest mixtures usually include some partials, that is, peptides with missed cleavage sites, you should choose a setting of 1, or maybe 2 missed cleavage sites. Don't specify a higher number without good reason, because each additional level of missed cleavages increases the number of calculated peptide masses to be matched against the experimental data. If the actual digest does not contain extended partials, this simply increases the number of random matches, and so reduces discrimination.
Mass Values

Select experimental mass values that are large enough to offer good discrimination, yet not so large as to be likely to be extended partials. A good mass range for trypsin is 1000 to 3500 Da.

If you have misgivings about an experimental mass value, then it is best to leave it out. An example would be a peak which is broader than the others, indicating that it may be an unresolved doublet.

Imagine a tryptic digest of a 20 kDa protein. We would expect something around 20 perfect cleavage peptides. If the digest was incomplete, or there was a non-quantitative modification, we might expect to double the number of peptides observed.
If 100 peaks are taken from the mass spectrum of this digest and submitted to Mascot then either 60 to 80 peaks are noise or there are extensive non-quantitative modifications. Either possibility is bad news for search specificity.
Autolytic Peptide Masses

For low level digests, it can be useful to screen the experimental data for enzyme autolysis fragments.

Mass Tolerance

Be generous in setting the peptide mass tolerance. If an experimental mass falls just outside the allowed window, then it contributes nothing towards the score. However, remember that the number of spurious matches, and the search time, increase with the size of the error window.

With Mascot 2.2 and later, if intensity information is supplied, Mascot will attempt to use this to discriminate against noise peaks. However, this is not a substitute for having a high quality peak list.
Protein Molecular Weight
Supplying a protein molecular weight to some search engines can be risky, because many of the sequence database entries are for the least processed form of a protein. For example, the SwissProt entry for bovine insulin, INS_BOVIN, is actually the sequence of the precursor protein including signal and connecting peptides. This adds up to a molecular weight of 11,394 Da, so that a search based too tightly around an experimental measurement of the molecular weight of this protein (5734 Da) would fail to find a correct match.

This is not a problem with Mascot, because the protein molecular weight is applied as a sliding window. That is, for each database entry, Mascot looks for the highest scoring set of peptide matches which are within a contiguous stretch of sequence less than or equal to the specified protein molecular weight.
This will often be less than the mass of the entire sequence entry (unless the data set happens to include both the N-terminal and C-terminal peptides). Consequently, if you specify a value for the protein molecular weight, this acts only as a ceiling. Not only will you see smaller proteins on the hit list, you will also see larger ones, but all of the reported matches will be within a stretch of sequence less than or equal to the specified mass.

Looking at the Search Results

Confidence in a peptide mass fingerprint result may come from having independent supporting evidence. For example, if the analyte originated from a spot at approximately 40 kDa on a 2D gel separation of yeast proteins, then the anticipated result of a peptide mass fingerprint is a 40 kDa yeast protein. If the top scoring protein fits this expectation, the search is deemed "successful". If the top scoring match is a 200 kDa protein from a different species, the initial reaction is likely to be that the search has "failed".
While this is a reasonable approach, Mascot provides additional guidance in the form of a significance level. By default, the significance level is set at 5%. That is, if the score for a particular match exceeds the significance level, there is less than a 1 in 20 chance that the observed match is a random event.
If the score is substantially above the significance level, look carefully before dismissing the result as spurious. Conversely, if the score is below the significance level, examine the match sceptically.
In most cases, there is prior knowledge of the origin of a sample, so it is only natural to look for matches to proteins from a particular species or kingdom. While a Mascot search can be restricted to a particular species, the taxonomy filter should be used with care:

Many sequence databases do not provide species information in a systematic and rigorous form

Contaminants can never be ruled out, and could come from any species, e.g. BSA or keratins

Unless the genome of the species of interest is completely sequenced, there is no guarantee that the true sequence of the analyte protein is actually present in the database. If it is missing, then high scoring matches from other species are of interest because they are likely to be homologous to the unknown.

It is the uncertainty in the mass of the intact protein which is the Achilles heel of a peptide mass fingerprint. This uncertainty is unavoidable, even when an accurate experimental mass for the intact protein is available, because it is unlikely that the mass of the expressed and processed protein will be exactly the same as that of the sequence entry in the protein database. A peptide mass fingerprint can only provide the statistically most probable identification. This is a great step forward over simply counting peptide mass matches, which can only work when a ceiling is placed on the intact protein mass. Otherwise, the mega-proteins always come out top of the list due to random matches. Unfortunately, even with an ideal scoring algorithm, there may be insufficient matching mass values for a confident identification without making assumptions about the intact protein mass or the species.
One method of improving the specificity of a peptide mass fingerprint was first proposed by Peter James [James, 1994]. Simply do additional digests using different proteases. Seeing the same protein with a high score in two independent digests provides a similar degree of confidence to seeing multiple peptide matches in an MS/MS ions search.