Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt Metagenomic and Environmental Sequences (UniMES)

Last modified June 17, 2014


The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. We provide UniMES clusters in order to obtain complete coverage of sequence space at different resolutions.

UniMES clusters

Clustered sets of sequences are available at two resolutions: 100% (unimes_cluster100.fasta) and >90% (unimes_cluster90.fasta). In unimes_cluster100.fasta, identical sequences and subfragments from unimes.fasta are placed into a single cluster. The unimes_cluster90.fasta is built by clustering unimes_cluster100.fasta representative sequences (the longest sequence in a cluster) using the CD-HIT algorithm (Li W. and Godzik A., Bioinformatics, 22:1658-1659, 2006) such that each cluster is composed of sequences that have at least 90% sequence identity, to the representative sequence. Only the representative sequences of the clusters are present in these files.