Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Sodium- and chloride-dependent betaine transporter

Gene

SLC6A12

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Transports betaine and GABA. May have a role in regulation of GABAergic transmission in the brain through the reuptake of GABA into presynaptic terminals, as well as in osmotic regulation.

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Biological processi

Neurotransmitter transport, Symport, Transport

Enzyme and pathway databases

ReactomeiR-HSA-352230. Amino acid transport across the plasma membrane.
R-HSA-442660. Na+/Cl- dependent neurotransmitter transporters.
R-HSA-71288. Creatine metabolism.
R-HSA-888593. Reuptake of GABA.

Protein family/group databases

TCDBi2.A.22.3.1. the neurotransmitter:sodium symporter (nss) family.

Names & Taxonomyi

Protein namesi
Recommended name:
Sodium- and chloride-dependent betaine transporter
Alternative name(s):
BGT-1
Na(+)/Cl(-) betaine/GABA transporter
Solute carrier family 6 member 12
Gene namesi
Name:SLC6A12
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 12

Organism-specific databases

HGNCiHGNC:11045. SLC6A12.

Subcellular locationi

Topology

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Topological domaini1 – 4444CytoplasmicSequence analysisAdd
BLAST
Transmembranei45 – 6521Helical; Name=1Sequence analysisAdd
BLAST
Transmembranei73 – 9220Helical; Name=2Sequence analysisAdd
BLAST
Transmembranei117 – 13721Helical; Name=3Sequence analysisAdd
BLAST
Topological domaini138 – 21073ExtracellularSequence analysisAdd
BLAST
Transmembranei211 – 22919Helical; Name=4Sequence analysisAdd
BLAST
Transmembranei238 – 25518Helical; Name=5Sequence analysisAdd
BLAST
Transmembranei291 – 30818Helical; Name=6Sequence analysisAdd
BLAST
Transmembranei320 – 34122Helical; Name=7Sequence analysisAdd
BLAST
Transmembranei374 – 39320Helical; Name=8Sequence analysisAdd
BLAST
Transmembranei423 – 44119Helical; Name=9Sequence analysisAdd
BLAST
Transmembranei458 – 47821Helical; Name=10Sequence analysisAdd
BLAST
Transmembranei499 – 51820Helical; Name=11Sequence analysisAdd
BLAST
Transmembranei538 – 55619Helical; Name=12Sequence analysisAdd
BLAST
Topological domaini557 – 61458CytoplasmicSequence analysisAdd
BLAST

GO - Cellular componenti

  • integral component of membrane Source: ProtInc
  • integral component of plasma membrane Source: GO_Central
  • neuron projection Source: GO_Central
  • plasma membrane Source: Reactome
Complete GO annotation...

Keywords - Cellular componenti

Membrane

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA35908.

Chemistry

ChEMBLiCHEMBL3715.
GuidetoPHARMACOLOGYi932.

Polymorphism and mutation databases

BioMutaiSLC6A12.
DMDMi257050987.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 614614Sodium- and chloride-dependent betaine transporterPRO_0000214788Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Disulfide bondi157 ↔ 166By similarity
Glycosylationi171 – 1711N-linked (GlcNAc...)Sequence analysis
Glycosylationi183 – 1831N-linked (GlcNAc...)Sequence analysis

Keywords - PTMi

Disulfide bond, Glycoprotein

Proteomic databases

MaxQBiP48065.
PaxDbiP48065.
PRIDEiP48065.

PTM databases

PhosphoSiteiP48065.

Expressioni

Tissue specificityi

Liver, heart, skeletal muscle, placenta, and a widespread distribution in the brain.

Gene expression databases

BgeeiP48065.
CleanExiHS_SLC6A12.
ExpressionAtlasiP48065. baseline and differential.
GenevisibleiP48065. HS.

Organism-specific databases

HPAiHPA034973.

Interactioni

Subunit structurei

Interacts with LIN7C.By similarity

Binary interactionsi

WithEntry#Exp.IntActNotes
RELQ048643EBI-3843589,EBI-307352

Protein-protein interaction databases

BioGridi112430. 2 interactions.
IntActiP48065. 2 interactions.
STRINGi9606.ENSP00000352702.

Chemistry

BindingDBiP48065.

Structurei

3D structure databases

ProteinModelPortaliP48065.
SMRiP48065. Positions 35-575.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Sequence similaritiesi

Keywords - Domaini

Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiKOG3659. Eukaryota.
COG0733. LUCA.
GeneTreeiENSGT00760000118857.
HOGENOMiHOG000116406.
HOVERGENiHBG071421.
InParanoidiP48065.
KOiK05039.
OMAiNFTSPVM.
OrthoDBiEOG793B71.
PhylomeDBiP48065.
TreeFamiTF343812.

Family and domain databases

InterProiIPR000175. Na/ntran_symport.
IPR002983. Na/ntran_symport_betaine.
[Graphical view]
PANTHERiPTHR11616. PTHR11616. 1 hit.
PfamiPF00209. SNF. 1 hit.
[Graphical view]
PRINTSiPR01198. BETTRANSPORT.
PR00176. NANEUSMPORT.
PROSITEiPS00610. NA_NEUROTRAN_SYMP_1. 1 hit.
PS00754. NA_NEUROTRAN_SYMP_2. 1 hit.
PS50267. NA_NEUROTRAN_SYMP_3. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

P48065-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MDGKVAVQEC GPPAVSWVPE EGEKLDQEDE DQVKDRGQWT NKMEFVLSVA
60 70 80 90 100
GEIIGLGNVW RFPYLCYKNG GGAFFIPYFI FFFVCGIPVF FLEVALGQYT
110 120 130 140 150
SQGSVTAWRK ICPLFQGIGL ASVVIESYLN VYYIIILAWA LFYLFSSFTS
160 170 180 190 200
ELPWTTCNNF WNTEHCTDFL NHSGAGTVTP FENFTSPVME FWERRVLGIT
210 220 230 240 250
SGIHDLGSLR WELALCLLLA WVICYFCIWK GVKSTGKVVY FTATFPYLML
260 270 280 290 300
VILLIRGVTL PGAYQGIIYY LKPDLFRLKD PQVWMDAGTQ IFFSFAICQG
310 320 330 340 350
CLTALGSYNK YHNNCYKDCI ALCFLNSATS FVAGFVVFSI LGFMSQEQGV
360 370 380 390 400
PISEVAESGP GLAFIAFPKA VTMMPLSQLW SCLFFIMLIF LGLDSQFVCV
410 420 430 440 450
ECLVTASIDM FPRQLRKSGR RELLILTIAV MCYLIGLFLV TEGGMYIFQL
460 470 480 490 500
FDYYASSGIC LLFLSLFEVV CISWVYGADR FYDNIEDMIG YRPWPLVKIS
510 520 530 540 550
WLFLTPGLCL ATFLFSLSKY TPLKYNNVYV YPPWGYSIGW FLALSSMVCV
560 570 580 590 600
PLFVVITLLK TRGPFRKRLR QLITPDSSLP QPKQHPCLDG SAGRNFGPSP
610
TREGLIAGEK ETHL
Length:614
Mass (Da):69,368
Last modified:September 1, 2009 - v2
Checksum:iD8FE8D259CE64B62
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti10 – 101C → Y in AAA66574 (PubMed:7861179).Curated
Sequence conflicti571 – 5722QL → HV in AAA87029 (PubMed:7589472).Curated
Sequence conflicti576 – 5761D → N in BAG36439 (PubMed:14702039).Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti10 – 101C → R.3 Publications
Corresponds to variant rs557881 [ dbSNP | Ensembl ].
VAR_058704

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U27699 mRNA. Translation: AAA87029.1.
L42300 mRNA. Translation: AAA66574.1.
AK313690 mRNA. Translation: BAG36439.1.
AC007406 mRNA. No translation available.
CH471116 Genomic DNA. Translation: EAW88975.1.
CH471116 Genomic DNA. Translation: EAW88976.1.
CH471116 Genomic DNA. Translation: EAW88977.1.
CH471116 Genomic DNA. Translation: EAW88978.1.
BC126215 mRNA. Translation: AAI26216.1.
BC126217 mRNA. Translation: AAI26218.1.
CCDSiCCDS8501.1.
PIRiS68236.
RefSeqiNP_001116319.1. NM_001122847.2.
NP_001116320.1. NM_001122848.2.
NP_001193860.1. NM_001206931.1.
NP_003035.3. NM_003044.4.
XP_005253803.1. XM_005253746.1.
XP_011519312.1. XM_011521010.1.
UniGeneiHs.437174.
Hs.737267.

Genome annotation databases

EnsembliENST00000359674; ENSP00000352702; ENSG00000111181.
ENST00000397296; ENSP00000380464; ENSG00000111181.
ENST00000424061; ENSP00000399136; ENSG00000111181.
ENST00000536824; ENSP00000444268; ENSG00000111181.
GeneIDi6539.
KEGGihsa:6539.
UCSCiuc001qhz.4. human.

Keywords - Coding sequence diversityi

Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U27699 mRNA. Translation: AAA87029.1.
L42300 mRNA. Translation: AAA66574.1.
AK313690 mRNA. Translation: BAG36439.1.
AC007406 mRNA. No translation available.
CH471116 Genomic DNA. Translation: EAW88975.1.
CH471116 Genomic DNA. Translation: EAW88976.1.
CH471116 Genomic DNA. Translation: EAW88977.1.
CH471116 Genomic DNA. Translation: EAW88978.1.
BC126215 mRNA. Translation: AAI26216.1.
BC126217 mRNA. Translation: AAI26218.1.
CCDSiCCDS8501.1.
PIRiS68236.
RefSeqiNP_001116319.1. NM_001122847.2.
NP_001116320.1. NM_001122848.2.
NP_001193860.1. NM_001206931.1.
NP_003035.3. NM_003044.4.
XP_005253803.1. XM_005253746.1.
XP_011519312.1. XM_011521010.1.
UniGeneiHs.437174.
Hs.737267.

3D structure databases

ProteinModelPortaliP48065.
SMRiP48065. Positions 35-575.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi112430. 2 interactions.
IntActiP48065. 2 interactions.
STRINGi9606.ENSP00000352702.

Chemistry

BindingDBiP48065.
ChEMBLiCHEMBL3715.
GuidetoPHARMACOLOGYi932.

Protein family/group databases

TCDBi2.A.22.3.1. the neurotransmitter:sodium symporter (nss) family.

PTM databases

PhosphoSiteiP48065.

Polymorphism and mutation databases

BioMutaiSLC6A12.
DMDMi257050987.

Proteomic databases

MaxQBiP48065.
PaxDbiP48065.
PRIDEiP48065.

Protocols and materials databases

DNASUi6539.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000359674; ENSP00000352702; ENSG00000111181.
ENST00000397296; ENSP00000380464; ENSG00000111181.
ENST00000424061; ENSP00000399136; ENSG00000111181.
ENST00000536824; ENSP00000444268; ENSG00000111181.
GeneIDi6539.
KEGGihsa:6539.
UCSCiuc001qhz.4. human.

Organism-specific databases

CTDi6539.
GeneCardsiSLC6A12.
H-InvDBHIX0037041.
HIX0201858.
HGNCiHGNC:11045. SLC6A12.
HPAiHPA034973.
MIMi603080. gene.
neXtProtiNX_P48065.
PharmGKBiPA35908.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG3659. Eukaryota.
COG0733. LUCA.
GeneTreeiENSGT00760000118857.
HOGENOMiHOG000116406.
HOVERGENiHBG071421.
InParanoidiP48065.
KOiK05039.
OMAiNFTSPVM.
OrthoDBiEOG793B71.
PhylomeDBiP48065.
TreeFamiTF343812.

Enzyme and pathway databases

ReactomeiR-HSA-352230. Amino acid transport across the plasma membrane.
R-HSA-442660. Na+/Cl- dependent neurotransmitter transporters.
R-HSA-71288. Creatine metabolism.
R-HSA-888593. Reuptake of GABA.

Miscellaneous databases

GenomeRNAii6539.
PROiP48065.
SOURCEiSearch...

Gene expression databases

BgeeiP48065.
CleanExiHS_SLC6A12.
ExpressionAtlasiP48065. baseline and differential.
GenevisibleiP48065. HS.

Family and domain databases

InterProiIPR000175. Na/ntran_symport.
IPR002983. Na/ntran_symport_betaine.
[Graphical view]
PANTHERiPTHR11616. PTHR11616. 1 hit.
PfamiPF00209. SNF. 1 hit.
[Graphical view]
PRINTSiPR01198. BETTRANSPORT.
PR00176. NANEUSMPORT.
PROSITEiPS00610. NA_NEUROTRAN_SYMP_1. 1 hit.
PS00754. NA_NEUROTRAN_SYMP_2. 1 hit.
PS50267. NA_NEUROTRAN_SYMP_3. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Molecular cloning and functional characterization of a GABA/betaine transporter from human kidney."
    Rasola A., Galietta L.J.V., Barone V., Romeo G., Bagnasco S.
    FEBS Lett. 373:229-233(1995) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA], VARIANT ARG-10.
    Tissue: Kidney.
  2. "Cloning and expression of a betaine/GABA transporter from human brain."
    Borden L.A., Smith K.E., Gustafson E.L., Branchek T.A., Weinshank R.L.
    J. Neurochem. 64:977-984(1995) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA].
    Tissue: Corpus striatum.
  3. "Complete sequencing and characterization of 21,243 full-length human cDNAs."
    Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.
    , Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K., Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A., Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M., Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y., Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M., Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K., Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S., Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J., Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y., Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N., Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S., Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S., Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O., Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H., Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B., Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y., Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T., Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y., Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S., Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T., Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M., Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T., Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K., Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R., Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.
    Nat. Genet. 36:40-45(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA], VARIANT ARG-10.
    Tissue: Fetal brain.
  4. "The finished DNA sequence of human chromosome 12."
    Scherer S.E., Muzny D.M., Buhay C.J., Chen R., Cree A., Ding Y., Dugan-Rocha S., Gill R., Gunaratne P., Harris R.A., Hawes A.C., Hernandez J., Hodgson A.V., Hume J., Jackson A., Khan Z.M., Kovar-Smith C., Lewis L.R.
    , Lozado R.J., Metzker M.L., Milosavljevic A., Miner G.R., Montgomery K.T., Morgan M.B., Nazareth L.V., Scott G., Sodergren E., Song X.-Z., Steffen D., Lovering R.C., Wheeler D.A., Worley K.C., Yuan Y., Zhang Z., Adams C.Q., Ansari-Lari M.A., Ayele M., Brown M.J., Chen G., Chen Z., Clerc-Blankenburg K.P., Davis C., Delgado O., Dinh H.H., Draper H., Gonzalez-Garay M.L., Havlak P., Jackson L.R., Jacob L.S., Kelly S.H., Li L., Li Z., Liu J., Liu W., Lu J., Maheshwari M., Nguyen B.-V., Okwuonu G.O., Pasternak S., Perez L.M., Plopper F.J.H., Santibanez J., Shen H., Tabor P.E., Verduzco D., Waldron L., Wang Q., Williams G.A., Zhang J., Zhou J., Allen C.C., Amin A.G., Anyalebechi V., Bailey M., Barbaria J.A., Bimage K.E., Bryant N.P., Burch P.E., Burkett C.E., Burrell K.L., Calderon E., Cardenas V., Carter K., Casias K., Cavazos I., Cavazos S.R., Ceasar H., Chacko J., Chan S.N., Chavez D., Christopoulos C., Chu J., Cockrell R., Cox C.D., Dang M., Dathorne S.R., David R., Davis C.M., Davy-Carroll L., Deshazo D.R., Donlin J.E., D'Souza L., Eaves K.A., Egan A., Emery-Cohen A.J., Escotto M., Flagg N., Forbes L.D., Gabisi A.M., Garza M., Hamilton C., Henderson N., Hernandez O., Hines S., Hogues M.E., Huang M., Idlebird D.G., Johnson R., Jolivet A., Jones S., Kagan R., King L.M., Leal B., Lebow H., Lee S., LeVan J.M., Lewis L.C., London P., Lorensuhewa L.M., Loulseged H., Lovett D.A., Lucier A., Lucier R.L., Ma J., Madu R.C., Mapua P., Martindale A.D., Martinez E., Massey E., Mawhiney S., Meador M.G., Mendez S., Mercado C., Mercado I.C., Merritt C.E., Miner Z.L., Minja E., Mitchell T., Mohabbat F., Mohabbat K., Montgomery B., Moore N., Morris S., Munidasa M., Ngo R.N., Nguyen N.B., Nickerson E., Nwaokelemeh O.O., Nwokenkwo S., Obregon M., Oguh M., Oragunye N., Oviedo R.J., Parish B.J., Parker D.N., Parrish J., Parks K.L., Paul H.A., Payton B.A., Perez A., Perrin W., Pickens A., Primus E.L., Pu L.-L., Puazo M., Quiles M.M., Quiroz J.B., Rabata D., Reeves K., Ruiz S.J., Shao H., Sisson I., Sonaike T., Sorelle R.P., Sutton A.E., Svatek A.F., Svetz L.A., Tamerisa K.S., Taylor T.R., Teague B., Thomas N., Thorn R.D., Trejos Z.Y., Trevino B.K., Ukegbu O.N., Urban J.B., Vasquez L.I., Vera V.A., Villasana D.M., Wang L., Ward-Moore S., Warren J.T., Wei X., White F., Williamson A.L., Wleczyk R., Wooden H.S., Wooden S.H., Yen J., Yoon L., Yoon V., Zorrilla S.E., Nelson D., Kucherlapati R., Weinstock G., Gibbs R.A.
    Nature 440:346-351(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
    Tissue: Corpus striatum.
  5. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  6. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA], VARIANT ARG-10.
    Tissue: Cerebellum.

Entry informationi

Entry nameiS6A12_HUMAN
AccessioniPrimary (citable) accession number: P48065
Secondary accession number(s): A0AV52, B2R992, D3DUN8
Entry historyi
Integrated into UniProtKB/Swiss-Prot: February 1, 1996
Last sequence update: September 1, 2009
Last modified: June 8, 2016
This is version 141 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 12
    Human chromosome 12: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.