References
Last modified February 17, 2011
This section contains the literature citations that are the sources of data used to annotate the entry. Each reference is numbered and contains several subsections allowing a precise description of a given citation.
Up to 7 distinct subsections can compose a reference block: the ‘Number’ of the reference in the list, the ‘Title’, the ‘Author(s) names’, the ‘Reference information’ (or exact citation), the ‘Cross-references’ to bibliographic databases (including a link to the abstract, if available), the ‘Citation content’ and the ‘Sequence origin’.
Example: Q9XTY6
1. Number
The reference number gives a sequential number to each reference citation in an entry. This number is used to link to the appropriate reference in the ‘General annotation (Comments)’ and ‘Sequence annotation (Features)’ sections.
2. Title
The title of the publication (or any other source) is cited as precisely as possible given the limitations of the computer character set.
Example: Q21507
- Major title words are not capitalized;
- The text of a title always ends with either a period ’.’, a question mark ’?’ or an exclamation mark ’!’;
- Double quotation marks ’ ” ’ in the text of the title are replaced by single quotation marks;
- Titles of articles published in a language other than English have been translated into English;
- Greek letters are written in full (alpha, beta, etc.).
3. Author(s) names
We list the author(s) names in the order given in the original publication.
Example: P11071
Name initials can be followed by an abbreviation such as ‘Jr’ for Junior), ‘Sr’ (Senior), ‘II’, ‘III’ or ‘IV’ (2nd, 3rd and 4th).
Example: P00350
The author(s) names are cited as completely as possible: we keep all initials and hyphens between initials. The German umlaut is replaced by an ‘e’, which follows the modified vowel. If authors do not have any initial, we add an ‘X.’ after the name.
We also try to be as consistent as possible with author(s) names: when an author name is misspelled, we correct it and homogenize it in the database.
In some cases, the name provided is actually the name of a consortium, and not of individual authors. That is mainly used for direct submissions to databases but can also be used in full references, when the consortium is cited as an author. Note that consortium and authors names may coexist in a single reference.
Examples: Q7TQA9, O60260
4. Reference information
The reference information contains the conventional citation information for the reference.
a) Journal citations
The reference information for a journal citation includes the journal abbreviation, the volume number, the page range and the year of publication.
Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards. A list of the abbreviations currently in use is given in the document ‘Controlled vocabulary of journals’.
Example: P03024
When a reference is made to a publication which is ‘in press’ at the time the database is released, the page range, and possibly the volume number, are indicated as ‘0’ (zero).
b) Electronic publications
The reference information for an electronic publication includes an ‘(er)’prefix.
Examples: O64948, Q09517
c) Book citations
The reference information for articles found in books or other types of publication includes the book name, the volume number, the page range, the publisher, the city and the year.
Examples: P00065, P04560, P02675
d) Unpublished observations
The reference information for unpublished observations includes the month and the year.
We use the expression ‘unpublished observations’ to cite communications directly submitted to UniProtKB/Swiss-Prot by scientists and concerning their unpublished biological information on various aspects of the entry.
Example: P24551
e) Thesis
The reference information for Ph.D. theses includes a ‘Thesis’ prefix, the year, the institution name, the city and the country.
Example: P60773
f) Patent applications
The reference information for patent applications includes the international publication number of the patent and the date.
Example: P29853
g) Submissions
The reference information for submissions includes the date and the database to which the data were submitted.
We report the data submitted to the following databases:- the EMBL/GenBank/DDBJ databases
- UniProtKB
- the PDB data bank
- the PIR data bank
Examples: P50388, P83886, P38013
5. Cross-references
Cross-references from the ‘References’ section are optional. They indicate the identifier assigned to a specific reference in a bibliographic database and provide the link.
When present, it provides cross-references to:- PubMed (through the PubMed Unique Identifier – PMID)
- AGRICOLA
- the abstract as supplied by the publishers
- the article from publisher, which corresponds to the digital Object Identifier (DOI)
Examples: P02675, Q10670, Q9LFB2, Q3EDJ0
6. Citation content (Cited for)
The citation content indicates the type of information that was retrieved from a given reference and used to annotate the entry (sequence, protein-protein interaction, variants or mutations, PTMs and 3D structure, etc.).
Sequence information retrieved from a reference is described in detail by its range and its source (translation of a nucleic acid sequence or amino acid sequence).
Example: P25719
The comment ‘NUCLEOTIDE SEQUENCE’ is usually tagged with a qualifier, indicating the source of the sequence data. There are 5 types of sources:
- GENOMIC DNA when the individual gene has been sequenced from DNA;
- GENOMIC RNA for RNA viruses, when the individual gene has been sequenced from RNA;
- MRNA when the individual cDNA has been sequenced;
- LARGE SCALE GENOMIC DNA when the gene has been sequenced as part of a large scale genome sequencing project;
- LARGE SCALE MRNA when the cDNA has been sequenced as part of a large-scale cDNA sequencing project.
Example: Q9QY42
When the sequence describes specific isoform(s), this is indicated in brackets, after the sequence source.
Example: Q8TCU6
The qualifier ‘[LARGE SCALE ANALYSIS]’ indicates that the reference reports large scale analyses and thus the individual results may not have been manually reviewed.
Example: Q9JIX8
Protein-protein interactions (using the official gene name when it exists), mutagenesis experiments, natural variants and post-translational modifications (PTMs) information is also precisely indicated.
Examples: Q96BI3, Q96EP1, P62739
When 3D structure information has been retrieved from the reference, we indicate the method used and – for X-ray crystallography – the highest resolution, the range of the domain, and the structure that has been determined.
Examples: P17427, P00831
7. Sequence origin
The sequence origin is optional and indicates the strain(s), tissue(s), plasmid(s) and transposon(s) from which the sequence is derived.
The strains listed in the ‘Strains’ token are sorted alphabetically. All frequently occuring strains in UniProtKB are listed in the document ‘Controlled vocabulary of strains’.
The tissues listed in the ‘Tissue’ token are sorted alphabetically. All tissues indicated in this token in UniProtKB/Swiss-Prot are listed in the document ‘Controlled vocabulary of tissues’. Whenever possible, UniProtKB/TrEMBL also makes use of this controlled tissue list, and efforts are made to automatically match tissues in UniProtKB/TrEMBL entries to tissues from this list. However, due to the nature of the data in UniProtKB/TrEMBL, this is not always possible.
The ‘Plasmid’ token is only used if an entry describes an identical sequence encoded on more than one plasmid. The document ‘Controlled vocabulary of plasmids’ lists all the plasmids that are used in UniProtKB/Swiss-Prot in the context of the ‘plasmid’ token.
Examples: P18445, Q28125, P30867, P12121, P00810, Q9EVG8.
Many bacterial or fungal strains have names composed of an acronym (ATCC, DSM, NRRL…) followed by a number. These strains are maintained in specific culture collections. The most frequently cited are the following:
| Acronym | Culture collection |
|---|---|
| ATCC | American Type Culture Collection; Rockville, USA |
| CBS | Centraalbureau voor Schimmelcultures; Baarn and Delft, Netherlands |
| CECT | Coleccion Espagnola de Cultivos Tipo; Valencia, Spain |
| CCAP | Culture Collection of Algae and Protozoa; U.K. |
| CCMP | Culture Collection of Marine Phytoplankton |
| DSM | Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH; Germany |
| IAM | Institute of Applied Microbiology; University of Tokyo, Japan |
| IFO | Institute for Fermentation; Osaka, Japan |
| KCC | Culture collection of Actinomycetes, Kaken Chemical Co; Tokyo, Japan |
| NCDO | National Collection of Dairy Organisms; Reading, U.K. |
| NCIB | National Collection of Industrial Bacteria; Aberdeen, U.K. |
| NCPPB | National Collection of Plant Pathogenic Bacteria; U.K. |
| NCTC | National Collection of Type Cultures; London, U.K. |
| NRCC | National Research Council of Canada |
| NRRL | Agricultural Research Service Culture Collection, National Center for Agricultural Utilization Research |
| USDA | U.S. Department of Agriculture; USA |
| UTEX | Culture collection of Algae at the University of Texas at Austin; USA |
