Plant protein annotation project
The main goal of the Plant protein annotation project is the manual annotation of plant-specific proteins or protein families.
Due to the polyploid nature of plant genomes (potato is tetraploid, wheat is hexaploid…) and to frequent genome duplications, plants are known to contain large genes families, some of which include up to 100 closely related members that can differ by only one nucleotide in the open reading frame.
Our major effort is currently focused on manual annotation of the proteomes of both a dicot (Arabidopsis thaliana) and a monocot (Oryza sativa). Later on we will then propagate the relevant annotation to orthologous proteins from other plant species. We will also annotate all the proteins involved in specific pathways not present in our two model plants, such as the nitrogen fixation pathway of Medicago truncatula once the sequences and the gene predictions are reliable.
Arabidopsis thaliana is a 20-25 cm tall flowering plant native to Europe, Asia, and northwestern Africa with a rapid life cycle (six weeks). It is used extensively as a model organism in plant biology and genetics. With about 157 million base pairs and five chromosomes, Arabidopsis has one of the smallest genomes among plants and it was the first one to be sequenced in 2000.
We are working on the establishment and annotation of a comprehensive, non-redundant complete proteome of Arabidopsis. As a first step towards achieving this goal we have compared the content of UniProtKB with the list of proteins established by The Arabidopsis Information Resource (TAIR). In several cases this led us to complement the sequence information that was already present in UniProtKB with data available at TAIR. Since then, UniProtKB and TAIR have been kept manually synchronized. Based on our family annotation, changes to the current gene model predictions are sometimes necessary and all our gene model improvements are transmitted to TAIR for integration in their database.
Rice contains two major subspecies: the sticky, short grained japonica or sinica variety, and the non-sticky, long-grained indica variety. A first draft of the complete sequences of both genomes was published in 2002 while improved versions were released in 2005.
Our major effort is currently directed towards the annotation of the proteins encoded by the genome of Oryza sativa subspecies japonica, cultivar nipponbare.