FASTA headers
The following is a description of FASTA headers for UniProtKB (including alternative isoforms), UniRef, UniParc, UniMES and archived UniProtKB versions. NCBI's program formatdb (in particular its -o option) is compatible with the UniProtKB fasta headers.
UniProtKB
>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName[ GN=GeneName]PE=ProteinExistence SV=SequenceVersionWhere:
- db is 'sp' for UniProtKB/Swiss-Prot and 'tr' for UniProtKB/TrEMBL.
- UniqueIdentifier is the primary accession number of the UniProtKB entry.
- EntryName is the entry name of the UniProtKB entry.
- ProteinName is the recommended name of the UniProtKB entry as
annotated in the
RecNamefield. For UniProtKB/TrEMBL entries without aRecNamefield, theSubNamefield is used. In case of multipleSubNames, the first one is used. The 'precursor' attribute is excluded, 'Fragment' is included with the name if applicable. - OrganismName is the scientific name of the organism of the UniProtKB entry.
- GeneName is the first gene name of the UniProtKB entry.
If there is no gene name,
OrderedLocusNameorORFname, theGNfield is not listed. - ProteinExistence is the numerical value describing the evidence for the existence of the protein.
- SequenceVersion is the version number of the sequence.
Examples:
>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana GN=acantho2 PE=1 SV=1 >sp|P27748|ACOX_RALEH Acetoin catabolism protein X OS=Ralstonia eutropha (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) GN=acoX PE=4 SV=2 >sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus PE=1 SV=1 >tr|A3SA23|A3SA23_9RHOB TonB dependent, hydroxamate-type ferrisiderophore, outer membrane receptor OS=Sulfitobacter sp. EE-36 GN=EE36_08023 PE=3 SV=1 >tr|Q8N2H2|Q8N2H2_HUMAN CDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens PE=2 SV=1Alternative isoforms (this only applies to UniProtKB/Swiss-Prot):
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName[ GN=GeneName]Where:
- IsoID is the isoform identifier as assigned in the ALTERNATIVE PRODUCTS section of the UniProtKB entry.
- IsoformName is the isoform name as annotated in the ALTERNATIVE PRODUCTS
Namefield of the UniProtKB entry.
Example:
sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis GN=YWHAB
UniRef
>UniqueIdentifier ClusterName n=Members Tax=Taxon RepID=RepresentativeMemberWhere:
- UniqueIdentifier is the primary accession number of the UniRef cluster.
- ClusterName is the name of the UniRef cluster.
- Members is the number of UniRef cluster members.
- Taxon is the scientific name of the lowest common taxon shared by all UniRef cluster members.
- RepresentativeMember is the entry name of the representative member of the UniRef cluster.
Example:
>UniRef100_A5DI11 Elongation factor 2 n=1 Tax=Pichia guilliermondii RepID=EF2_PICGU
UniParc
>UniqueIdentifier status=StatusWhere:
- UniqueIdentifier is the primary accession number of the UniParc entry.
- Status is 'active' if the UniParc entry has at least one active cross-reference, and 'inactive' if it does not have any active cross-references.
Example:
>UPI0000000005 status=active
UniMES
>UniqueIDentifier ProteinName OS=OrganismName[ Pep=SourcePeptideIdentifier]SV=SequenceVersionWhere:
- UniqueIdentifier is the primary accession number of the UniMES entry.
- ProteinName is the protein name of the UniMES entry.
- OrganismName is the scientific name of the organism (group) of the UniMES entry.
- SourcePeptideIdentifier is the (optional) peptide identifier provided by the submitter.
- SequenceVersion is the version number of the sequence.
Example:
>MES00000000005 Putative uncharacterized protein GOS_3018412 (Fragment) OS=marine metagenome Pep=JCVI_PEP_1096688850003 SV=1
Archived UniProtKB sequence versions
>db|UniqueIdentifier archived from Release ReleaseNumber ReleaseDate SV=SequenceVersionWhere:
- db is 'sp' for UniProtKB/Swiss-Prot and 'tr' for UniProtKB/TrEMBL.
- UniqueIdentifier is the primary accession number of the UniProtKB entry.
- ReleaseNumber refers to the release from which the sequence was archived (Swiss-Prot or TrEMBL release numbers for releases prior to the first UniProt release, and both UniProt and Swiss-Prot or TrEMBL release numbers for releases after the first UniProt release).
- ReleaseDate is the date of the release form which the sequence was archived.
- SequenceVersion is the version number of the sequence.
Examples:
"pre-UniProt":>sp|P05067 archived from Release 18.0 01-MAY-1991 SV=3 >tr|Q55167 archived from Release 17.0 01-JUN-2001 SV=1"post-UniProt":
>sp|P05067 archived from Release 9.2/51.2 28-NOV-2006 SV=3 >tr|A0RTJ8 archived from Release 11.0/36.0 29-MAY-2007 SV=1
