Prokaryotic protein annotation project
The Prokaryotic protein annotation project focuses on the manual annotation of bacterial and archaeal-specific proteins and protein families.
Our major effort is currently directed towards the annotation of proteins from the already well-characterized model bacteria Escherichia coli and Bacillus subtilis, as well as the annotation of pathogens such as Mycobacterium tuberculosis.
- All manually reviewed Escherichia coli entries can be found here (statistics)
- All manually reviewed Bacillus subtilis entries can be found here (statistics)
- All manually reviewed Mycobacterium tuberculosis entries can be found here (statistics)
High-quality automated annotation propagation
Due to the quantity of data produced today thanks to next-generation sequencing and the ever increasing rate of complete genome sequencing, it is no longer possible to manually annotate even a small portion of these genomes, despite the considerable demand for corrected and annotated complete proteomes. To enrich their annotation in UniProtKB, we developed HAMAP (High-quality Automated and Manual Annotation of Proteins), whose goal is to automatically annotate a significant percentage of the huge amount of proteins originating from complete genome sequencing projects. This automatic annotation pipeline, based on a collection of family profiles and manually created annotation rules, is only applied in cases where it can produce the same quality as manual annotation would, that is for proteins that are part of well-defined families or subfamilies. By this we mean protein families which have a well-defined function and which are well conserved at the sequence level.