Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Publications section

Last modified June 15, 2017

The set of publications fully curated in UniProtKB/Swiss-Prot and publications imported in UniProtKB/TrEMBL is complemented by additional publications that have been computationally mapped from other resources to UniProtKB entries.

The publications annotated in UniProtKB and the computationally mapped publications are combined into a ?Publications? view, which can be accessed from a link under the ?Display? heading on the left hand side of a UniProtKB page. In this view you can filter the publications list by source and categories that are based on the type of data a publication contains about the protein (such as function, interaction, sequence, etc.) or the number of proteins it describes (“small scale” vs “large scale”).

References are numbered and contain several subsections allowing a precise description of a given citation.

Up to 7 distinct subsections can compose a reference block: the ‘Number’ of the reference in the list, the ‘Title’, the ‘Author(s) names’, the ‘Reference information’ (or exact citation), the ‘Cross-references’ to bibliographic databases (including a link to the abstract, if available), the ‘Citation content’ and the ‘Sequence origin’.
Example: Q9XTY6

These subsections are provided for the citations used to curate a UniProtKB/Swiss-Prot entry, and where possible, the same elements are also displayed for imported references in UniProtKB/TrEMBL as well as computationally mapped publications.

1. Number

The reference number gives a sequential number to each reference citation in an entry. This number is used to link to the appropriate reference in the different subsections.

2. Title

The title of the publication (or any other source) is cited as precisely as possible given the limitations of the computer character set.
Example: Q21507

The format of the title is not always identical to that found in the actual publication (submission, etc.):

  • Major title words are not capitalized;
  • The text of a title always ends with either a period ’.’, a question mark ’?’ or an exclamation mark ’!’;
  • Double quotation marks ’ ” ’ in the text of the title are replaced by single quotation marks;
  • Titles of articles published in a language other than English have been translated into English;
  • Greek letters are written in full (alpha, beta, etc.).

3. Author(s) names

We list the author(s) names in the order given in the original publication.
Example: P11071

Name initials can be followed by an abbreviation such as ‘Jr’ for Junior), ‘Sr’ (Senior), ‘II’, ‘III’ or ‘IV’ (2nd, 3rd and 4th).
Example: P00350

The author(s) names are cited as completely as possible: we keep all initials and hyphens between initials. The German umlaut is replaced by an ‘e’, which follows the modified vowel. If authors do not have any initial, we add an ‘X.’ after the name.

We also try to be as consistent as possible with author(s) names: when an author name is misspelled, we correct it and homogenize it in the database.

In some cases, the name provided is actually the name of a consortium, and not of individual authors. That is mainly used for direct submissions to databases but can also be used in full references, when the consortium is cited as an author. Note that consortium and authors names may coexist in a single reference.
Examples: Q7TQA9, O60260

4. Reference information

The reference information contains the conventional citation information for the reference.

a) Journal citations

The reference information for a journal citation includes the journal abbreviation, the volume number, the page range and the year of publication.

Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards. A list of the abbreviations currently in use is given in the document ‘Controlled vocabulary of journals’.
Example: P03024

When a reference is made to a publication which is ‘in press’ at the time the database is released, the page range, and possibly the volume number, are indicated as ‘0’ (zero).

b) Electronic publications

The reference information for an electronic publication includes an ‘(er)’prefix.
Examples: O64948, Q09517

c) Book citations

The reference information for articles found in books or other types of publication includes the book name, the volume number, the page range, the publisher, the city and the year.
Examples: P00065, P04560, P02675

d) Unpublished observations

The reference information for unpublished observations includes the month and the year.

We use the expression ‘unpublished observations’ to cite communications directly submitted to UniProtKB/Swiss-Prot by scientists and concerning their unpublished biological information on various aspects of the entry.
Example: P24551

e) Thesis

The reference information for Ph.D. theses includes a ‘Thesis’ prefix, the year, the institution name, the city and the country.
Example: P60773

f) Patent applications

The reference information for patent applications includes the international publication number of the patent and the date.
Example: P29853

g) Submissions

The reference information for submissions includes the date and the database to which the data were submitted.

We report the data submitted to the following databases:

  • the EMBL/GenBank/DDBJ databases
  • UniProtKB
  • the PDB data bank
  • the PIR data bank

Examples: P50388, P83886, P38013

5. Cross-references

Cross-references from the ‘References’ section are optional. They indicate the identifier assigned to a specific reference in a bibliographic database and provide the link.

When present, it provides cross-references to:

  • PubMed (through the PubMed Unique Identifier – PMID)
  • the abstract as supplied by the publishers
  • the article from publisher, which corresponds to the digital Object Identifier (DOI)

Examples: P02675, Q10670, Q9LFB2, Q3EDJ0

6. Cited for

The ‘Cited for’ indicates the type of information that was retrieved from a given reference and used to annotate the entry (sequence, protein-protein interaction, variants or mutations, PTMs and 3D structure, etc.).

Sequence information retrieved from a reference is described in detail by its range and its source (translation of a nucleic acid sequence or amino acid sequence).
Example: P25719

The comment ‘NUCLEOTIDE SEQUENCE’ is usually tagged with a qualifier, indicating the source of the sequence data. There are 5 types of sources:

  • GENOMIC DNA when the individual gene has been sequenced from DNA;
  • GENOMIC RNA for RNA viruses, when the individual gene has been sequenced from RNA;
  • MRNA when the individual cDNA has been sequenced;
  • LARGE SCALE GENOMIC DNA when the gene has been sequenced as part of a large scale genome sequencing project;
  • LARGE SCALE MRNA when the cDNA has been sequenced as part of a large-scale cDNA sequencing project.

Example: Q9QY42

When the sequence describes specific isoform(s), this is indicated in brackets, after the sequence source.
Example: Q8TCU6

The qualifier ‘[LARGE SCALE ANALYSIS]’ indicates that the reference reports large scale analyses and thus the individual results may not have been manually reviewed.
Example: Q9JIX8

Protein-protein interactions (using the official gene name when it exists), mutagenesis experiments, natural variants and post-translational modifications (PTMs) information is also precisely indicated.
Examples: Q96BI3, Q96EP1, P62739

When 3D structure information has been retrieved from the reference, we indicate the method used and – for X-ray crystallography – the highest resolution, the range of the domain, and the structure that has been determined.
Examples: P17427, P00831

7. Sequence origin

The sequence origin is optional and indicates the strain(s), tissue(s), plasmid(s) and transposon(s) from which the sequence is derived.

The strains listed in the ‘Strains’ token are sorted alphabetically. All frequently occuring strains in UniProtKB are listed in the document ‘Controlled vocabulary of strains’.

The tissues listed in the ‘Tissue’ token are sorted alphabetically. All tissues indicated in this token in UniProtKB/Swiss-Prot are listed in the document ‘Controlled vocabulary of tissues’. Whenever possible, UniProtKB/TrEMBL also makes use of this controlled tissue list, and efforts are made to automatically match tissues in UniProtKB/TrEMBL entries to tissues from this list. However, due to the nature of the data in UniProtKB/TrEMBL, this is not always possible.

The ‘Plasmid’ token is only used if an entry describes an identical sequence encoded on more than one plasmid. The document ‘Controlled vocabulary of plasmids’ lists all the plasmids that are used in UniProtKB/Swiss-Prot in the context of the ‘plasmid’ token.
Examples: P18445, Q28125, P30867, P12121, P00810, Q9EVG8.

Many bacterial or fungal strains have names composed of an acronym (ATCC, DSM, NRRL…) followed by a number. These strains are maintained in specific culture collections. The most frequently cited are the following:

Acronym Culture collection
ATCC American Type Culture Collection; Rockville, USA
CBS Centraalbureau voor Schimmelcultures; Baarn and Delft, Netherlands
CECT Coleccion Espagnola de Cultivos Tipo; Valencia, Spain
CCAP Culture Collection of Algae and Protozoa; U.K.
CCMP Culture Collection of Marine Phytoplankton
DSM Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH; Germany
IAM Institute of Applied Microbiology; University of Tokyo, Japan
IFO Institute for Fermentation; Osaka, Japan
KCC Culture collection of Actinomycetes, Kaken Chemical Co; Tokyo, Japan
NCDO National Collection of Dairy Organisms; Reading, U.K.
NCIB National Collection of Industrial Bacteria; Aberdeen, U.K.
NCPPB National Collection of Plant Pathogenic Bacteria; U.K.
NCTC National Collection of Type Cultures; London, U.K.
NRCC National Research Council of Canada
NRRL Agricultural Research Service Culture Collection, National Center for Agricultural Utilization Research
USDA U.S. Department of Agriculture; USA
UTEX Culture collection of Algae at the University of Texas at Austin; USA

Additional bibliography

As a comprehensive and high-quality resource of protein sequence and functional information, UniProtKB strives to provide comprehensive literature citations associated with protein sequences and their characterization. Currently about 2 thirds of the UniProtKB PubMed citations are found in UniProtKB/Swiss-Prot, as a result of active integration in the course of manual curation.

In order to keep up with the explosive growth of literature and to give our users access to additional publications, we decided to integrate additional sources of literature from other annotated databases into UniProtKB. For this purpose we selected a number external databases, e.g. Entrez Gene (GeneRIFs), SGD, MGI, GAD and PDB, and extracted citations that were mapped to UniProtKB entries. This additional protein bibliography information helps our users to better explore the existing knowledge of their proteins of interest.

Related documents