SAM - Sequence Analysis Methods for automatic annotation of unreviewed entries

Last modified November 11, 2016

UniProt’s Automatic Annotation pipeline enhances the unreviewed records in UniProtKB by enriching them with automatic classification and annotation. In this context, we use a suite of Sequence Analysis Methods (SAM) to enrich the unreviewed TrEMBL records in the UniProt Knowledgebase with extra sequence-specific information.


Predictions of sequence features such as Signal, Transmembrane and Coil regions are generated using the following software from external providers:

These methods are applied to UniProtKB sequences by InterPro to predict sequence features. More annotations (mainly keywords) are then added automatically to enrich the generated predictions. The new predictions are propagated to all the UniProtKB/TrEMBL records that do not already contain such feature predictions from the UniRule automatic annotation system.

Overlaps and sanity checks

We use the overlap of different methods to confirm the presence of a predicted sequence feature:

  • Transmembrane region
    TMHMM and Phobius predictors are used to infer transmembrane regions. If there is an overlap of at least 10 amino acids between TMHMM and Phobius results, the transmembrane region is annotated using the sequence ranges predicted by Phobius. Otherwise, if there is no such overlap, no predictions are generated.

Transmembrane region prediction

  • Signal peptide
    TMHMM, SignalP and Phobius predictors are used to infer signal peptides. If there is a prediction from SignalP and none from TMHMM in the same range, the signal peptide is annotated.
    If SignalP and Phobius both predict a signal peptide, then it is annotated.
    When predicted N-terminal signal peptides (as predicted by SignalP) and transmembrane regions (as predicted by TMHMM) overlap, then the prediction returned by Phobius is used to discriminate between the two possibilities.
    In all the above cases, we annotate the sequence region predicted by SignalP.
  • Coils region
    To predict coiled-coils regions, only the Coils predictor is used. No overlap resolving is performed.

