Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Cystic fibrosis transmembrane conductance regulator

Gene

CFTR

Organism
Oryctolagus cuniculus (Rabbit)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at transcript leveli

Functioni

Involved in the transport of chloride ions. May regulate bicarbonate secretion and salvage in epithelial cells by regulating the SLC4A7 transporter. Can inhibit the chloride channel activity of ANO1. Plays a role in the chloride and bicarbonate homeostasis during sperm epididymal maturation and capacitation (By similarity).By similarity

Catalytic activityi

ATP + H2O = ADP + phosphate.

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Nucleotide bindingi458 – 4658ATP 1PROSITE-ProRule annotation
Nucleotide bindingi1245 – 12528ATP 2PROSITE-ProRule annotation

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Molecular functioni

Chloride channel, Hydrolase, Ion channel

Keywords - Biological processi

Ion transport, Transport

Keywords - Ligandi

ATP-binding, Chloride, Nucleotide-binding

Names & Taxonomyi

Protein namesi
Recommended name:
Cystic fibrosis transmembrane conductance regulator
Short name:
CFTR
Alternative name(s):
ATP-binding cassette sub-family C member 7
Channel conductance-controlling ATPase (EC:3.6.3.49)
cAMP-dependent chloride channel
Gene namesi
Name:CFTR
Synonyms:ABCC7
OrganismiOryctolagus cuniculus (Rabbit)
Taxonomic identifieri9986 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresLagomorphaLeporidaeOryctolagus
Proteomesi
  • UP000001811 Componenti: Unplaced

Subcellular locationi

  • Early endosome membrane By similarity; Multi-pass membrane protein Sequence analysis
  • Cell membrane By similarity; Multi-pass membrane protein Sequence analysis

  • Note: In epithelial cells, detected on the apical side, but not associated with cilia.By similarity

Topology

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Topological domaini1 – 8080CytoplasmicSequence analysisAdd
BLAST
Transmembranei81 – 10121Helical; Name=1PROSITE-ProRule annotationAdd
BLAST
Topological domaini102 – 11716ExtracellularSequence analysisAdd
BLAST
Transmembranei118 – 13821Helical; Name=2PROSITE-ProRule annotationAdd
BLAST
Topological domaini139 – 19456CytoplasmicSequence analysisAdd
BLAST
Transmembranei195 – 21521Helical; Name=3PROSITE-ProRule annotationAdd
BLAST
Topological domaini216 – 2205ExtracellularSequence analysis
Transmembranei221 – 24121Helical; Name=4PROSITE-ProRule annotationAdd
BLAST
Topological domaini242 – 30766CytoplasmicSequence analysisAdd
BLAST
Transmembranei308 – 32821Helical; Name=5PROSITE-ProRule annotationAdd
BLAST
Topological domaini329 – 3313ExtracellularSequence analysis
Transmembranei332 – 35019Helical; Name=6PROSITE-ProRule annotationAdd
BLAST
Topological domaini351 – 859509CytoplasmicSequence analysisAdd
BLAST
Transmembranei860 – 88021Helical; Name=7PROSITE-ProRule annotationAdd
BLAST
Topological domaini881 – 91131ExtracellularSequence analysisAdd
BLAST
Transmembranei912 – 93221Helical; Name=8PROSITE-ProRule annotationAdd
BLAST
Topological domaini933 – 99058CytoplasmicSequence analysisAdd
BLAST
Transmembranei991 – 101121Helical; Name=9PROSITE-ProRule annotationAdd
BLAST
Topological domaini1012 – 10132ExtracellularSequence analysis
Transmembranei1014 – 103421Helical; Name=10PROSITE-ProRule annotationAdd
BLAST
Topological domaini1035 – 110268CytoplasmicSequence analysisAdd
BLAST
Transmembranei1103 – 112321Helical; Name=11PROSITE-ProRule annotationAdd
BLAST
Topological domaini1124 – 11285ExtracellularSequence analysis
Transmembranei1129 – 114921Helical; Name=12PROSITE-ProRule annotationAdd
BLAST
Topological domaini1150 – 1481332CytoplasmicSequence analysisAdd
BLAST

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Cell membrane, Endosome, Membrane

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 14811481Cystic fibrosis transmembrane conductance regulatorPRO_0000093427Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Lipidationi524 – 5241S-palmitoyl cysteineBy similarity
Modified residuei549 – 5491PhosphoserineBy similarity
Modified residuei660 – 6601PhosphoserineBy similarity
Modified residuei686 – 6861PhosphoserineBy similarity
Cross-linki688 – 688Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin)By similarity
Modified residuei700 – 7001PhosphoserineBy similarity
Modified residuei712 – 7121PhosphoserineBy similarity
Modified residuei717 – 7171PhosphothreonineBy similarity
Modified residuei737 – 7371PhosphoserineBy similarity
Modified residuei768 – 7681PhosphoserineBy similarity
Modified residuei790 – 7901PhosphoserineBy similarity
Modified residuei795 – 7951PhosphoserineBy similarity
Modified residuei813 – 8131PhosphoserineBy similarity
Glycosylationi894 – 8941N-linked (GlcNAc...)Sequence analysis
Glycosylationi900 – 9001N-linked (GlcNAc...)Sequence analysis
Glycosylationi909 – 9091N-linked (GlcNAc...)Sequence analysis
Lipidationi1396 – 13961S-palmitoyl cysteineBy similarity
Modified residuei1445 – 14451PhosphoserineBy similarity
Modified residuei1457 – 14571PhosphoserineBy similarity

Post-translational modificationi

Ubiquitinated, leading to its degradation in the lysosome. Deubiquitination by USP10 in early endosomes, enhances its endocytic recycling. Ubiquitinated by RNF185 during ER stress (By similarity).By similarity
Phosphorylated; activates the channel. It is not clear whether PKC phosphorylation itself activates the channel or permits activation by phosphorylation at PKA sites. Phosphorylated by AMPK (By similarity).By similarity

Keywords - PTMi

Glycoprotein, Isopeptide bond, Lipoprotein, Palmitate, Phosphoprotein, Ubl conjugation

Expressioni

Tissue specificityi

Isoform 1 is expressed in the pancreas. Isoform 2 is specifically expressed in the ventricle.2 Publications

Interactioni

Subunit structurei

Interacts with MYO6 and GOPC. Interacts with SLC4A7 through SLC9A3R1. Interacts with SHANK2 (By similarity). Found in a complex with MYO5B and RAB11A (By similarity). Interacts with ANO1 (By similarity). Interacts with SLC26A3, SLC26A6 and SLC9A3R1. Interacts with SLC26A8. Interacts with AHCYL1; the interaction increases CFTR activity (By similarity).By similarity

Protein-protein interaction databases

STRINGi9986.ENSOCUP00000009248.

Structurei

3D structure databases

ProteinModelPortaliQ00554.
SMRiQ00554. Positions 389-671.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini81 – 365285ABC transmembrane type-1 1PROSITE-ProRule annotationAdd
BLAST
Domaini423 – 646224ABC transporter 1PROSITE-ProRule annotationAdd
BLAST
Domaini859 – 1155297ABC transmembrane type-1 2PROSITE-ProRule annotationAdd
BLAST
Domaini1199 – 1444246ABC transporter 2PROSITE-ProRule annotationAdd
BLAST

Motif

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Motifi1479 – 14813PDZ-binding

Domaini

The PDZ-binding motif mediates interactions with GOPC and with the SLC4A7, SLC9A3R1/EBP50 complex.By similarity

Sequence similaritiesi

Contains 2 ABC transmembrane type-1 domains.PROSITE-ProRule annotation
Contains 2 ABC transporter domains.PROSITE-ProRule annotation

Keywords - Domaini

Repeat, Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiKOG0054. Eukaryota.
COG1132. LUCA.
HOVERGENiHBG004169.
InParanoidiQ00554.
KOiK05031.
OrthoDBiEOG7C2R0B.

Family and domain databases

Gene3Di3.40.50.300. 2 hits.
InterProiIPR003593. AAA+_ATPase.
IPR011527. ABC1_TM_dom.
IPR003439. ABC_transporter-like.
IPR017871. ABC_transporter_CS.
IPR009147. CFTR/ABCC7.
IPR025837. CFTR_reg_dom.
IPR027417. P-loop_NTPase.
[Graphical view]
PANTHERiPTHR24223:SF19. PTHR24223:SF19. 3 hits.
PfamiPF00664. ABC_membrane. 2 hits.
PF00005. ABC_tran. 2 hits.
PF14396. CFTR_R. 1 hit.
[Graphical view]
PRINTSiPR01851. CYSFIBREGLTR.
SMARTiSM00382. AAA. 2 hits.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 2 hits.
SSF90123. SSF90123. 2 hits.
TIGRFAMsiTIGR01271. CFTR_protein. 1 hit.
PROSITEiPS50929. ABC_TM1F. 2 hits.
PS00211. ABC_TRANSPORTER_1. 1 hit.
PS50893. ABC_TRANSPORTER_2. 2 hits.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q00554-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MQKSPLEKAG VLSKLFFSWT RPILRKGYRQ RLELSDIYQI PSADSADNLS
60 70 80 90 100
EKLEREWDRE LASKKKPKLI NALRRCFFWR FMFYGILLYL GEVTKAVQPL
110 120 130 140 150
LLGRIIASYD PDNKVERSIA IYLGIGLCLL FVVRTLLLHP AIFGLHHIGM
160 170 180 190 200
QMRIAMFSLI YKKTLKLSSR VLDKISIGQL ISLLSNNLNK FDEGLALAHF
210 220 230 240 250
VWISPLQVTL LMGLLWELLQ ASAFCGLAFL IVLALVQAGL GRMMMKYRDQ
260 270 280 290 300
RAGKINERLV ITSEMIENIQ SVKAYCWEEA MEKMIENLRQ TELKLTRKAA
310 320 330 340 350
YVRYFNSSAF FFSGFFVVFL SVLPYALTKG IILRKIFTTI SFCIVLRMAV
360 370 380 390 400
TRQFPWAVQT WYDSLGAINK IQDFLQKQEY KTLEYNLTTT EVVMDNVTAF
410 420 430 440 450
WEEGFGELFE KAKQNNSDRK ISNGDNNLFF SNFSLLGAPV LKDISFKIER
460 470 480 490 500
GQLLAVAGST GAGKTSLLMM IMGELEPSEG KIKHSGRISF CSQFSWIMPG
510 520 530 540 550
TIKENIIFGV SYDEYRYKSV IKACQLEEDI SKFTEKDNTV LGEGGITLSG
560 570 580 590 600
GQRARISLAR AVYKDADLYL LDSPFGYLDV LTEKEIFESC VCKLMANKTR
610 620 630 640 650
ILVTSKMEHL KKADKILILH EGSSYFYGTF SELQSLRPDF SSKLMGYDSF
660 670 680 690 700
DQFSAERRNS ILTETLRRFS LEGDASISWN DTRKQSFKQN GELGEKRKNS
710 720 730 740 750
ILNPVNSMRK FSIVPKTPLQ MNGIEEDSDA SIERRLSLVP DSEQGEAILP
760 770 780 790 800
RSNMINTGPM LQGCRRQSVL NLMTHSVSQG PSIYRRTTTS ARKMSLAPQT
810 820 830 840 850
NLTEMDIYSR RLSQESGLEI SEEINEEDLK ECFIDDVDSI PTVTTWNTYL
860 870 880 890 900
RYITVHRSLI FVLIWCIVIF LAEVAASLVV LWLFGNTAPQ DKENSTKSGN
910 920 930 940 950
SSYAVIITNT SSYYFFYIYV GVADTLLALG LFRGLPLVHT LITVSKILHH
960 970 980 990 1000
KMLHSVLQAP MSTLNTLKAG GILNRFSKDI AILDDLLPLT IFDFIQLLLI
1010 1020 1030 1040 1050
VVGAIAVVSV LQPYIFLATV PVIAAFILLR AYFLHTSQQL KQLESEGRSP
1060 1070 1080 1090 1100
IFTHLVTSLK GLWTLRAFGR QPYFETLFHK ALNLHTANWF LYLSTLRWFQ
1110 1120 1130 1140 1150
MRIEMIFVLF FIAVAFISIL TTGEGEGRVG IILTLAMNIM STLQWAVNSS
1160 1170 1180 1190 1200
IDVDSLMRSV SRVFKFIDMP TEETKSTKSI KPSSNCQLSK VMIIENQHVK
1210 1220 1230 1240 1250
KDDVWPSGGQ MTVKGLTAKY IDSGNAILEN ISFSISPGQR VGLLGRTGSG
1260 1270 1280 1290 1300
KSTLLSAFLR LLNTEGEIQI DGVSWDSITL QQWRKAFGVI PQKVFIFSGT
1310 1320 1330 1340 1350
FRKNLDPYEQ WSDQEIWKVA DEVGLRSVIE QFPGKLDFVL VDGGYVLSHG
1360 1370 1380 1390 1400
HKQLMCLARS VLSKAKILLL DEPSAHLDPI TYQIIRRTLK QAFADCTVIL
1410 1420 1430 1440 1450
CEHRIEAMLE CQRFLVIEEN TVRQYESIQK LLSEKSLFRQ AISSSDRAKL
1460 1470 1480
FPHRNSSKHK SRPQITALKE EAEEEVQGTR L
Length:1,481
Mass (Da):168,042
Last modified:November 28, 2006 - v4
Checksum:i1B217AAE75DDFE8A
GO
Isoform 2 (identifier: Q00554-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     164-193: Missing.

Show »
Length:1,451
Mass (Da):164,701
Checksum:i019A26093E876259
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti3 – 31K → R in AAC48608 (PubMed:8692817).Curated
Sequence conflicti66 – 661K → N in AAC48608 (PubMed:8692817).Curated
Sequence conflicti87 – 871L → F in AAC48608 (PubMed:8692817).Curated
Sequence conflicti115 – 1151V → E in AAC48608 (PubMed:8692817).Curated
Sequence conflicti426 – 4272NN → DS no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti438 – 4381A → T no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti442 – 4421K → E in AAC48608 (PubMed:8692817).Curated
Sequence conflicti445 – 4451S → N no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti469 – 4702MM → II no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti472 – 4721M → T in AAC48608 (PubMed:8692817).Curated
Sequence conflicti483 – 4831K → N no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti518 – 5181K → R in AAC48608 (PubMed:8692817).Curated
Sequence conflicti518 – 5181K → R no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti534 – 5341T → A no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti539 – 5391T → I no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti546 – 5461I → V no nucleotide entry (PubMed:1381291).Curated
Sequence conflicti602 – 6021L → M in AAC48608 (PubMed:8692817).Curated
Sequence conflicti677 – 6771I → V in AAC48608 (PubMed:8692817).Curated
Sequence conflicti715 – 7151P → L in AAC48608 (PubMed:8692817).Curated
Sequence conflicti731 – 7311S → T in AAC48608 (PubMed:8692817).Curated
Sequence conflicti749 – 7491L → V in AAA31200 (PubMed:1719001).Curated
Sequence conflicti791 – 7911A → T in AAC48608 (PubMed:8692817).Curated
Sequence conflicti888 – 8892AP → PL no nucleotide entry (PubMed:7686720).Curated
Sequence conflicti1158 – 11581R → Q in AAC48608 (PubMed:8692817).Curated
Sequence conflicti1165 – 11651K → M in AAC48608 (PubMed:8692817).Curated
Sequence conflicti1173 – 11742ET → A in AAC48608 (PubMed:8692817).Curated
Sequence conflicti1263 – 12631N → S in AAC48608 (PubMed:8692817).Curated
Sequence conflicti1478 – 14781G → E in AAY89018 (Ref. 3) Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei164 – 19330Missing in isoform 2. 2 PublicationsVSP_028888Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U40227 mRNA. Translation: AAC48608.1.
AF189720 mRNA. Translation: AAF01067.1.
DP000006 Genomic DNA. Translation: AAY89018.1.
AF186108 mRNA. Translation: AAD56415.1.
M96681 Genomic DNA. Translation: AAA31200.1.
PIRiJC6139.
RefSeqiNP_001076185.1. NM_001082716.1. [Q00554-1]
UniGeneiOcu.2383.

Genome annotation databases

GeneIDi100009471.
KEGGiocu:100009471.

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U40227 mRNA. Translation: AAC48608.1.
AF189720 mRNA. Translation: AAF01067.1.
DP000006 Genomic DNA. Translation: AAY89018.1.
AF186108 mRNA. Translation: AAD56415.1.
M96681 Genomic DNA. Translation: AAA31200.1.
PIRiJC6139.
RefSeqiNP_001076185.1. NM_001082716.1. [Q00554-1]
UniGeneiOcu.2383.

3D structure databases

ProteinModelPortaliQ00554.
SMRiQ00554. Positions 389-671.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

STRINGi9986.ENSOCUP00000009248.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

GeneIDi100009471.
KEGGiocu:100009471.

Organism-specific databases

CTDi1080.

Phylogenomic databases

eggNOGiKOG0054. Eukaryota.
COG1132. LUCA.
HOVERGENiHBG004169.
InParanoidiQ00554.
KOiK05031.
OrthoDBiEOG7C2R0B.

Family and domain databases

Gene3Di3.40.50.300. 2 hits.
InterProiIPR003593. AAA+_ATPase.
IPR011527. ABC1_TM_dom.
IPR003439. ABC_transporter-like.
IPR017871. ABC_transporter_CS.
IPR009147. CFTR/ABCC7.
IPR025837. CFTR_reg_dom.
IPR027417. P-loop_NTPase.
[Graphical view]
PANTHERiPTHR24223:SF19. PTHR24223:SF19. 3 hits.
PfamiPF00664. ABC_membrane. 2 hits.
PF00005. ABC_tran. 2 hits.
PF14396. CFTR_R. 1 hit.
[Graphical view]
PRINTSiPR01851. CYSFIBREGLTR.
SMARTiSM00382. AAA. 2 hits.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 2 hits.
SSF90123. SSF90123. 2 hits.
TIGRFAMsiTIGR01271. CFTR_protein. 1 hit.
PROSITEiPS50929. ABC_TM1F. 2 hits.
PS00211. ABC_TRANSPORTER_1. 1 hit.
PS50893. ABC_TRANSPORTER_2. 2 hits.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 2).
    Tissue: Heart ventricle.
  2. "Oryctolagus cuniculus cornea epithelium CFTR chloride channel mRNA."
    Rae J.L.
    Submitted (SEP-1999) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1).
    Strain: New Zealand white.
    Tissue: Cornea.
  3. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. "Alternative splicing of CFTR Cl- channels in heart."
    Horowitz B., Tsung S.S., Hart P., Levesque P.C., Hume J.R.
    Am. J. Physiol. 264:H2214-H2220(1993) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 90-339 AND 847-1156 (ISOFORM 2), TISSUE SPECIFICITY.
    Tissue: Heart ventricle.
  5. "Partial cDNA sequence of the rabbit colonic chloride channel, CFTR."
    Selvaraj N., Prasad R., Rao M.C.
    Submitted (SEP-1999) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 127-287 (ISOFORM 1).
    Strain: New Zealand white.
    Tissue: Colon.
  6. "Expression of cystic fibrosis transmembrane regulator Cl- channels in heart."
    Levesque P.C., Hart P.J., Hume J.R., Kenyon J.L., Horowitz B.
    Circ. Res. 71:1002-1007(1992) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 423-592, TISSUE SPECIFICITY.
    Tissue: Heart ventricle.
  7. "A cross-species analysis of the cystic fibrosis transmembrane conductance regulator. Potential functional domains and regulatory sites."
    Diamond G., Scanlin T.F., Zasloff M.A., Bevins C.L.
    J. Biol. Chem. 266:22761-22769(1991) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 604-776.

Entry informationi

Entry nameiCFTR_RABIT
AccessioniPrimary (citable) accession number: Q00554
Secondary accession number(s): Q09YM9
, Q9TSD4, Q9TTX9, Q9TTY9
Entry historyi
Integrated into UniProtKB/Swiss-Prot: February 1, 1994
Last sequence update: November 28, 2006
Last modified: January 20, 2016
This is version 125 of the entry and version 4 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.