Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

DNA-directed RNA polymerase II subunit RPB1

Gene

ama-1

Organism
Caenorhabditis elegans
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

DNA-dependent RNA polymerase catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates. Largest and catalytic component of RNA polymerase II which synthesizes mRNA precursors and many functional non-coding RNAs. Forms the polymerase active center together with the second largest subunit. Pol II is the central component of the basal RNA polymerase II transcription machinery. It is composed of mobile elements that move relative to each other. RPB1 is part of the core element with the central large cleft, the clamp element that moves to open and close the cleft and the jaws that are thought to grab the incoming DNA template. At the start of transcription, a single-stranded DNA template strand of the promoter is positioned within the central active site cleft of Pol II. A bridging helix emanates from RPB1 and crosses the cleft near the catalytic site and is thought to promote translocation of Pol II by acting as a ratchet that moves the RNA-DNA hybrid through the active site by switching from straight to bent conformations at each step of nucleotide addition. During transcription elongation, Pol II moves on the template as the transcript elongates. Elongation is influenced by the phosphorylation status of the C-terminal domain (CTD) of Pol II largest subunit (RPB1), which serves as a platform for assembly of factors that regulate transcription initiation, elongation, termination and mRNA processing (By similarity). Involved in the transcription of several genes including those involved in embryogenesis (PubMed:14726532, PubMed:27541139).By similarity2 Publications

Catalytic activityi

Nucleoside triphosphate + RNA(n) = diphosphate + RNA(n+1).

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Metal bindingi66Zinc 1By similarity1
Metal bindingi69Zinc 1By similarity1
Metal bindingi76Zinc 1By similarity1
Metal bindingi79Zinc 1By similarity1
Metal bindingi106Zinc 2By similarity1
Metal bindingi109Zinc 2By similarity1
Metal bindingi149Zinc 2By similarity1
Metal bindingi177Zinc 2By similarity1
Metal bindingi489Magnesium 1; catalyticBy similarity1
Metal bindingi489Magnesium 2; shared with RPB2By similarity1
Metal bindingi491Magnesium 1; catalyticBy similarity1
Metal bindingi491Magnesium 2; shared with RPB2By similarity1
Metal bindingi493Magnesium 1; catalyticBy similarity1

GO - Molecular functioni

  • DNA binding Source: UniProtKB-KW
  • DNA-directed RNA polymerase activity Source: WormBase
  • metal ion binding Source: UniProtKB-KW
  • RNA polymerase II activity Source: WormBase

GO - Biological processi

  • embryo development ending in birth or egg hatching Source: WormBase
  • gastrulation Source: WormBase
  • mRNA transcription from RNA polymerase II promoter Source: WormBase
  • positive regulation of gastrulation Source: UniProtKB
  • positive regulation of transcription, DNA-templated Source: UniProtKB
  • transcription, DNA-templated Source: WormBase
Complete GO annotation...

Keywords - Molecular functioni

Nucleotidyltransferase, Transferase

Keywords - Biological processi

Transcription

Keywords - Ligandi

DNA-binding, Magnesium, Metal-binding, Zinc

Enzyme and pathway databases

ReactomeiR-CEL-112387. Elongation arrest and recovery.
R-CEL-113418. Formation of the Early Elongation Complex.
R-CEL-5578749. Transcriptional regulation by small RNAs.
R-CEL-674695. RNA Polymerase II Pre-transcription Events.
R-CEL-6781823. Formation of TC-NER Pre-Incision Complex.
R-CEL-6782135. Dual incision in TC-NER.
R-CEL-6782210. Gap-filling DNA repair synthesis and ligation in TC-NER.
R-CEL-6796648. TP53 Regulates Transcription of DNA Repair Genes.
R-CEL-6803529. FGFR2 alternative splicing.
R-CEL-6807505. RNA polymerase II transcribes snRNA genes.
R-CEL-72086. mRNA Capping.
R-CEL-72163. mRNA Splicing - Major Pathway.
R-CEL-72165. mRNA Splicing - Minor Pathway.
R-CEL-73776. RNA Polymerase II Promoter Escape.
R-CEL-73779. RNA Polymerase II Transcription Pre-Initiation And Promoter Opening.
R-CEL-75953. RNA Polymerase II Transcription Initiation.
R-CEL-75955. RNA Polymerase II Transcription Elongation.
R-CEL-76042. RNA Polymerase II Transcription Initiation And Promoter Clearance.
R-CEL-77075. RNA Pol II CTD phosphorylation and interaction with CE.

Names & Taxonomyi

Protein namesi
Recommended name:
DNA-directed RNA polymerase II subunit RPB1 (EC:2.7.7.6)
Short name:
RNA polymerase II subunit B1
Alternative name(s):
DNA-directed RNA polymerase III largest subunit
Gene namesi
Name:ama-1Imported
Synonyms:rpb-1Imported
ORF Names:F36A4.7Imported
OrganismiCaenorhabditis elegans
Taxonomic identifieri6239 [NCBI]
Taxonomic lineageiEukaryotaMetazoaEcdysozoaNematodaChromadoreaRhabditidaRhabditoideaRhabditidaePeloderinaeCaenorhabditis
Proteomesi
  • UP000001940 Componenti: Chromosome IV

Organism-specific databases

WormBaseiF36A4.7; CE46402; WBGene00000123; ama-1.

Subcellular locationi

  • Nucleus 1 Publication
  • Chromosome 1 Publication

  • Note: Localizes to punctate nucleoplasmic structures in the nuclei of interphase somatic cells when phosphorylated at 'Ser-5' of the C-terminal heptapeptide repeat (PubMed:14726532). Localizes to two discrete foci in the transcriptionally silent germ line nucleus (PubMed:14726532). Co-localizes with transcriptionally active chromatin in all autosomes of mitotic and meiotic nuclei in germ cells (PubMed:27541139).2 Publications

GO - Cellular componenti

  • DNA-directed RNA polymerase II, core complex Source: GO_Central
  • nucleus Source: WormBase
  • transcriptionally active chromatin Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Chromosome, DNA-directed RNA polymerase, Nucleus

Pathology & Biotechi

Disruption phenotypei

RNAi-mediated knockdown results in embryonic arrest at the 100-cell stage and prevents the embryonic transcription of several genes (PubMed:14726532). Surviving embryos exhibit gastrulation defects with decreased expression of genes involved in gastrulation (PubMed:27541139).2 Publications

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00000739351 – 1856DNA-directed RNA polymerase II subunit RPB1Add BLAST1856

Post-translational modificationi

The tandem 7 residues repeats in the C-terminal domain (CTD) can be highly phosphorylated. The phosphorylation activates Pol II (By similarity). Phosphorylation occurs mainly at residues 'Ser-2' and 'Ser-5' of the heptapeptide repeat and starts at the 3- to 4-cell embryonic stage (PubMed:14726532). Phosphorylation is likely mediated by cdk-7 (PubMed:11960010). The phosphorylation state is believed to result from the balanced action of site-specific CTD kinases and phosphatase, and a 'CTD code' that specifies the position of Pol II within the transcription cycle has been proposed (By similarity).1 PublicationBy similarity1 Publication

Keywords - PTMi

Phosphoprotein

Proteomic databases

EPDiP16356.
PaxDbiP16356.
PeptideAtlasiP16356.
PRIDEiP16356.

PTM databases

iPTMnetiP16356.

Expressioni

Developmental stagei

Expressed in embryo. During embryonic development, the form phosphorylated at 'Ser-2' of the C-terminal heptapeptide repeats is present only in transcriptionally active somatic cells.1 Publication

Gene expression databases

BgeeiWBGene00000123.

Interactioni

Subunit structurei

Component of the RNA polymerase II (Pol II) complex consisting of 12 subunits (By similarity). Interacts with sig-7 (PubMed:27541139).By similarity1 Publication

Binary interactionsi

WithEntry#Exp.IntActNotes
mdt-6Q9N3372EBI-1533906,EBI-1533827

Protein-protein interaction databases

IntActiP16356. 4 interactors.
STRINGi6239.F36A4.7.2.

Structurei

3D structure databases

ProteinModelPortaliP16356.
SMRiP16356.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Repeati1593 – 159917
Repeati1600 – 160627
Repeati1616 – 162237
Repeati1623 – 162947
Repeati1630 – 163657
Repeati1637 – 164367
Repeati1644 – 165077
Repeati1651 – 165787
Repeati1658 – 166497
Repeati1665 – 1671107
Repeati1672 – 1678117
Repeati1679 – 1685127
Repeati1686 – 1692137
Repeati1693 – 1699147
Repeati1700 – 1706157
Repeati1707 – 1713167
Repeati1720 – 1726177
Repeati1727 – 1733187
Repeati1734 – 1740197
Repeati1741 – 1747207
Repeati1748 – 1754217
Repeati1755 – 1761227
Repeati1769 – 1775237
Repeati1782 – 1788247
Repeati1789 – 1795257
Repeati1796 – 1802267
Repeati1803 – 1809277
Repeati1810 – 181628; approximate7

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni256 – 268Lid loopAdd BLAST13
Regioni314 – 331Rudder loopAdd BLAST18
Regioni827 – 839Bridging helixAdd BLAST13
Regioni1593 – 1816C-terminal domain (CTD); 28 X 7 AA approximate tandem repeats of Y-[ST]-P-[ST]-S-P-[AGKNQRST]Add BLAST224

Domaini

The C-terminal domain (CTD) serves as a platform for assembly of factors that regulate transcription initiation, elongation, termination and mRNA processing.Curated

Sequence similaritiesi

Belongs to the RNA polymerase beta' chain family.Curated

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiKOG0260. Eukaryota.
COG0086. LUCA.
GeneTreeiENSGT00850000132392.
HOGENOMiHOG000222975.
InParanoidiP16356.
KOiK03006.
OMAiDEDNGPY.
OrthoDBiEOG091G00CH.

Family and domain databases

InterProiIPR000722. RNA_pol_asu.
IPR000684. RNA_pol_II_repeat_euk.
IPR006592. RNA_pol_N.
IPR007080. RNA_pol_Rpb1_1.
IPR007066. RNA_pol_Rpb1_3.
IPR007083. RNA_pol_Rpb1_4.
IPR007081. RNA_pol_Rpb1_5.
IPR007075. RNA_pol_Rpb1_6.
IPR007073. RNA_pol_Rpb1_7.
[Graphical view]
PfamiPF04997. RNA_pol_Rpb1_1. 1 hit.
PF00623. RNA_pol_Rpb1_2. 1 hit.
PF04983. RNA_pol_Rpb1_3. 1 hit.
PF05000. RNA_pol_Rpb1_4. 1 hit.
PF04998. RNA_pol_Rpb1_5. 1 hit.
PF04992. RNA_pol_Rpb1_6. 1 hit.
PF04990. RNA_pol_Rpb1_7. 1 hit.
PF05001. RNA_pol_Rpb1_R. 20 hits.
[Graphical view]
SMARTiSM00663. RPOLA_N. 1 hit.
[Graphical view]
PROSITEiPS00115. RNA_POL_II_REPEAT. 26 hits.
[Graphical view]

Sequencei

Sequence statusi: Complete.

P16356-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MALVGVDFQA PLRIVSRVQF GILGPEEIKR MSVAHVEFPE VYENGKPKLG
60 70 80 90 100
GLMDPRQGVI DRRGRCMTCA GNLTDCPGHF GHLELAKPVF HIGFLTKTLK
110 120 130 140 150
ILRCVCFYCG RLLIDKSAPR VLEILKKTGT NSKKRLTMIY DLCKAKSVCE
160 170 180 190 200
GAAEKEEGMP DDPDDPMNDG KKVAGGCGRY QPSYRRVGID INAEWKKNVN
210 220 230 240 250
EDTQERKIML TAERVLEVFQ QITDEDILVI GMDPQFARPE WMICTVLPVP
260 270 280 290 300
PLAVRPAVVT FGSAKNQDDL THKLSDIIKT NQQLQRNEAN GAAAHVLTDD
310 320 330 340 350
VRLLQFHVAT LVDNCIPGLP TATQKGGRPL KSIKQRLKGK EGRIRGNLMG
360 370 380 390 400
KRVDFSARTV ITADPNLPID TVGVPRTIAQ NLTFPEIVTP FNVDKLQELV
410 420 430 440 450
NRGDTQYPGA KYIIRENGAR VDLRYHPRAA DLHLQPGYRV ERHMKDGDII
460 470 480 490 500
VFNRQPTLHK MSMMGHRVKI LPWSTFRMNL SVTSPYNADF DGDEMNLHLP
510 520 530 540 550
QSLETRAEIE EIAMVPRQLI TPQANKPVMG IVQDTLCAVR MMTKRDVFID
560 570 580 590 600
WPFMMDLLMY LPTWDGKVPQ PAILKPKPLW TGKQVFSLII PGNVNVLRTH
610 620 630 640 650
STHPDSEDSG PYKWISPGDT KVIIEHGELL SGIVCSKTVG KSAGNLLHVV
660 670 680 690 700
TLELGYEIAA NFYSHIQTVI NAWLIREGHT IGIGDTIADQ ATYLDIQNTI
710 720 730 740 750
RKAKQDVVDV IEKAHNDDLE PTPGNTLRQT FENKVNQILN DARDRTGSSA
760 770 780 790 800
QKSLSEFNNF KSMVVSGSKG SKINISQVIA CVGQQNVEGK RIPFGFRHRT
810 820 830 840 850
LPHFIKDDYG PESKGFVENS YLAGLTPSEF FFHAMGGREG LIDTAVKTAE
860 870 880 890 900
TGYIQRRLIK AMESVMVNYD GTVRNSLAQM VQLRYGEDGL DGMWVENQNM
910 920 930 940 950
PTMKPNNAVF ERDFRMDLTD NKFLRKNYSE DVVREIQESE DGISLVESEW
960 970 980 990 1000
SQLEEDRRLL RKIFPRGDAK IVLPCNLQRL IWNAQKIFKV DLRKPVNLSP
1010 1020 1030 1040 1050
LHVISGVREL SKKLIIVSGN DEISKQAQYN ATLLMNILLR STLCTKNMCT
1060 1070 1080 1090 1100
KSKLNSEAFD WLLGEIESRF QQAIAQPGEM VGALAAQSLG EPATQMTLNT
1110 1120 1130 1140 1150
FHYAGVSAKN VTLGVPRLKE IINVSKTLKT PSLTVFLTGA AAKDPEKAKD
1160 1170 1180 1190 1200
VLCKLEHTTL KKVTCNTAIY YDPDPKNTVI AEDEEWVSIF YEMPDHDLSR
1210 1220 1230 1240 1250
TSPWLLRIEL DRKRMVDKKL TMEMIADRIH GGFGNDVHTI YTDDNAEKLV
1260 1270 1280 1290 1300
FRLRIAGEDK GEAQEEQVDK MEDDVFLRCI EANMLSDLTL QGIPAISKVY
1310 1320 1330 1340 1350
MNQPNTDDKK RIIITPEGGF KSVADWILET DGTALLRVLS ERQIDPVRTT
1360 1370 1380 1390 1400
SNDICEIFEV LGIEAVRKAI EREMDNVISF DGSYVNYRHL ALLCDVMTAK
1410 1420 1430 1440 1450
GHLMAITRHG INRQEVGALM RCSFEETVDI LMEAAVHAEE DPVKGVSENI
1460 1470 1480 1490 1500
MLGQLARCGT GCFDLVLDVE KCKYGMEIPQ NVVMGGGFYG SFAGSPSNRE
1510 1520 1530 1540 1550
FSPAHSPWNS GVTPTYAGAA WSPTTGGMSP GAGFSPAGNT DGGASPFNEG
1560 1570 1580 1590 1600
GWSPASPGDP LGALSPRTPS YGGMSPGVYS PSSPQFSMTS PHYSPTSPSY
1610 1620 1630 1640 1650
SPTSPAAGQS PVSPSYSPTS PSYSPTSPSY SPTSPSYSPT SPSYSPTSPS
1660 1670 1680 1690 1700
YSPTSPSYSP SSPSYSPSSP SYSPSSPRYS PTSPTYSPTS PTYSPTSPTY
1710 1720 1730 1740 1750
SPTSPTYSPT SPSYESGGGY SPSSPKYSPS SPTYSPTSPS YSPTSPQYSP
1760 1770 1780 1790 1800
TSPQYSPSSP TYTPSSPTYN PTSPRGFSSP QYSPTSPTYS PTSPSYTPSS
1810 1820 1830 1840 1850
PQYSPTSPTY TPSPSEQPGT SAQYSPTSPT YSPSSPTYSP ASPSYSPSSP

TYDPNS
Length:1,856
Mass (Da):204,525
Last modified:March 21, 2012 - v3
Checksum:i379BB183B957D2D5
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti215V → D in AAA28126 (PubMed:2586513).Curated1
Sequence conflicti412 – 415Missing in AAA28126 (PubMed:2586513).Curated4
Sequence conflicti915R → RVSVAQNAIKL in AAA28126 (PubMed:2586513).Curated1
Sequence conflicti963I → D in AAA28126 (PubMed:2586513).Curated1
Sequence conflicti978Q → L in AAA28126 (PubMed:2586513).Curated1
Sequence conflicti994 – 995KP → NA in AAA28126 (PubMed:2586513).Curated2
Sequence conflicti1160 – 1162Missing in AAA28126 (PubMed:2586513).Curated3
Sequence conflicti1406 – 1407IT → YS in AAA28126 (PubMed:2586513).Curated2

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M29235 mRNA. Translation: AAA28126.1.
FO081153 Genomic DNA. Translation: CCD69532.1.
PIRiA34092.
T29959.
RefSeqiNP_500523.4. NM_068122.6.
UniGeneiCel.13014.

Genome annotation databases

EnsemblMetazoaiF36A4.7; F36A4.7; WBGene00000123.
GeneIDi177190.
KEGGicel:CELE_F36A4.7.
UCSCiF36A4.7. c. elegans.

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M29235 mRNA. Translation: AAA28126.1.
FO081153 Genomic DNA. Translation: CCD69532.1.
PIRiA34092.
T29959.
RefSeqiNP_500523.4. NM_068122.6.
UniGeneiCel.13014.

3D structure databases

ProteinModelPortaliP16356.
SMRiP16356.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

IntActiP16356. 4 interactors.
STRINGi6239.F36A4.7.2.

PTM databases

iPTMnetiP16356.

Proteomic databases

EPDiP16356.
PaxDbiP16356.
PeptideAtlasiP16356.
PRIDEiP16356.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsemblMetazoaiF36A4.7; F36A4.7; WBGene00000123.
GeneIDi177190.
KEGGicel:CELE_F36A4.7.
UCSCiF36A4.7. c. elegans.

Organism-specific databases

CTDi247749.
WormBaseiF36A4.7; CE46402; WBGene00000123; ama-1.

Phylogenomic databases

eggNOGiKOG0260. Eukaryota.
COG0086. LUCA.
GeneTreeiENSGT00850000132392.
HOGENOMiHOG000222975.
InParanoidiP16356.
KOiK03006.
OMAiDEDNGPY.
OrthoDBiEOG091G00CH.

Enzyme and pathway databases

ReactomeiR-CEL-112387. Elongation arrest and recovery.
R-CEL-113418. Formation of the Early Elongation Complex.
R-CEL-5578749. Transcriptional regulation by small RNAs.
R-CEL-674695. RNA Polymerase II Pre-transcription Events.
R-CEL-6781823. Formation of TC-NER Pre-Incision Complex.
R-CEL-6782135. Dual incision in TC-NER.
R-CEL-6782210. Gap-filling DNA repair synthesis and ligation in TC-NER.
R-CEL-6796648. TP53 Regulates Transcription of DNA Repair Genes.
R-CEL-6803529. FGFR2 alternative splicing.
R-CEL-6807505. RNA polymerase II transcribes snRNA genes.
R-CEL-72086. mRNA Capping.
R-CEL-72163. mRNA Splicing - Major Pathway.
R-CEL-72165. mRNA Splicing - Minor Pathway.
R-CEL-73776. RNA Polymerase II Promoter Escape.
R-CEL-73779. RNA Polymerase II Transcription Pre-Initiation And Promoter Opening.
R-CEL-75953. RNA Polymerase II Transcription Initiation.
R-CEL-75955. RNA Polymerase II Transcription Elongation.
R-CEL-76042. RNA Polymerase II Transcription Initiation And Promoter Clearance.
R-CEL-77075. RNA Pol II CTD phosphorylation and interaction with CE.

Miscellaneous databases

PROiP16356.

Gene expression databases

BgeeiWBGene00000123.

Family and domain databases

InterProiIPR000722. RNA_pol_asu.
IPR000684. RNA_pol_II_repeat_euk.
IPR006592. RNA_pol_N.
IPR007080. RNA_pol_Rpb1_1.
IPR007066. RNA_pol_Rpb1_3.
IPR007083. RNA_pol_Rpb1_4.
IPR007081. RNA_pol_Rpb1_5.
IPR007075. RNA_pol_Rpb1_6.
IPR007073. RNA_pol_Rpb1_7.
[Graphical view]
PfamiPF04997. RNA_pol_Rpb1_1. 1 hit.
PF00623. RNA_pol_Rpb1_2. 1 hit.
PF04983. RNA_pol_Rpb1_3. 1 hit.
PF05000. RNA_pol_Rpb1_4. 1 hit.
PF04998. RNA_pol_Rpb1_5. 1 hit.
PF04992. RNA_pol_Rpb1_6. 1 hit.
PF04990. RNA_pol_Rpb1_7. 1 hit.
PF05001. RNA_pol_Rpb1_R. 20 hits.
[Graphical view]
SMARTiSM00663. RPOLA_N. 1 hit.
[Graphical view]
PROSITEiPS00115. RNA_POL_II_REPEAT. 26 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiRPB1_CAEEL
AccessioniPrimary (citable) accession number: P16356
Secondary accession number(s): Q20090
Entry historyi
Integrated into UniProtKB/Swiss-Prot: August 1, 1990
Last sequence update: March 21, 2012
Last modified: November 30, 2016
This is version 147 of the entry and version 3 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programCaenorhabditis annotation project

Miscellaneousi

Miscellaneous

The binding of ribonucleoside triphosphate to the RNA polymerase II transcribing complex probably involves a two-step mechanism. The initial binding seems to occur at the entry (E) site and involves a magnesium ion temporarily coordinated by three conserved aspartate residues of the two largest RNA Pol II subunits. The ribonucleoside triphosphate is transferred by a rotation to the nucleotide addition (A) site for pairing with the template DNA. The catalytic A site involves three conserved aspartate residues of the RNA Pol II largest subunit which permanently coordinate a second magnesium ion.

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Caenorhabditis elegans
    Caenorhabditis elegans: entries, gene names and cross-references to WormBase
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.