Minority report | Cross-references for isoform sequences
› Latest from the prokaryotic world: bacterial Cas9, a new tool for genome engineering | Cross-references to ChiTaRS and SABIO-RK | Removal of cross-references to 8 2D gel databases and AGD
› Prokaryotes do it too: CRISPR, an RNA-based adaptive immune system in UniProt | Cross-references to GenomeRNAi, Protein Data Bank Japan, RCSB PDB and UniPathway
› Who wants to be a millionaire? The first million HAMAP-annotated entries in UniProtKB/TrEMBL | Cross-references to KO
The Prokaryotic protein annotation program focuses on the manual annotation of bacterial and archaeal-specific proteins and protein families.
Our major effort is currently directed towards the annotation of proteins from the already well-characterized model bacteria Escherichia coli and Bacillus subtilis, as well as the annotation of pathogens such as Mycobacterium tuberculosis.
- All manually reviewed Escherichia coli entries can be found here (statistics)
- All manually reviewed Bacillus subtilis entries can be found here (statistics)
- All manually reviewed Mycobacterium tuberculosis entries can be found here (statistics)
High-quality automated annotation propagation
Due to the quantity of data produced today thanks to next-generation sequencing and the ever increasing rate of complete genome sequencing, it is no longer possible to manually annotate even a small portion of these genomes, despite the considerable demand for corrected and annotated complete proteomes. To enrich their annotation in UniProtKB, we developed HAMAP (High-quality Automated and Manual Annotation of Proteins), whose goal is to automatically annotate a significant percentage of the huge amount of proteins originating from complete genome sequencing projects. This automatic annotation pipeline, based on a collection of family profiles and manually created annotation rules, is only applied in cases where it can produce the same quality as manual annotation would, that is for proteins that are part of well-defined families or subfamilies. By this we mean protein families which have a well-defined function and which are well conserved at the sequence level.