Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Luc7-like protein 3

Gene

LUC7L3

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Binds cAMP regulatory element DNA sequence. May play a role in RNA splicing.1 Publication

GO - Molecular functioni

  1. DNA binding Source: UniProtKB-KW
  2. mRNA binding Source: UniProtKB
  3. poly(A) RNA binding Source: UniProtKB

GO - Biological processi

  1. mRNA splice site selection Source: InterPro
  2. RNA splicing Source: UniProtKB
Complete GO annotation...

Keywords - Biological processi

mRNA processing, mRNA splicing

Keywords - Ligandi

DNA-binding

Names & Taxonomyi

Protein namesi
Recommended name:
Luc7-like protein 3
Alternative name(s):
Cisplatin resistance-associated-overexpressed protein
Luc7A
Okadaic acid-inducible phosphoprotein OA48-18
cAMP regulatory element-associated protein 1
Short name:
CRE-associated protein 1
Short name:
CREAP-1
Gene namesi
Name:LUC7L3
Synonyms:CREAP1, CROP, O48
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
ProteomesiUP000005640: Chromosome 17

Organism-specific databases

HGNCiHGNC:24309. LUC7L3.

Subcellular locationi

Nucleus speckle 2 Publications
Note: The subnuclear localization is affected by cisplatin.

GO - Cellular componenti

  1. nuclear speck Source: UniProtKB-SubCell
  2. nucleoplasm Source: HPA
  3. nucleus Source: UniProtKB
  4. U1 snRNP Source: InterPro
Complete GO annotation...

Keywords - Cellular componenti

Nucleus

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA165432062.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 432432Luc7-like protein 3PRO_0000058013Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei1 – 11N-acetylmethionine3 Publications
Modified residuei3 – 31Phosphoserine1 Publication
Modified residuei110 – 1101Phosphoserine1 Publication
Modified residuei231 – 2311N6-acetyllysine1 Publication
Modified residuei420 – 4201Phosphoserine1 Publication
Modified residuei425 – 4251Phosphoserine4 Publications
Modified residuei431 – 4311Phosphoserine2 Publications

Post-translational modificationi

Phosphorylated in vitro by SRPK1, SRPK2 and CLK1.1 Publication

Keywords - PTMi

Acetylation, Phosphoprotein

Proteomic databases

MaxQBiO95232.
PaxDbiO95232.
PeptideAtlasiO95232.
PRIDEiO95232.

PTM databases

PhosphoSiteiO95232.

Expressioni

Tissue specificityi

Widely expressed. Highest levels in heart, brain, pancreas, thymus, ovary, small intestine and peripheral blood leukocytes, as well as cerebellum, putamen and pituitary gland. Lowest levels in lung, liver and kidney. Also expressed in fetal tissues, including brain, heart, kidney, thymus and lung.2 Publications

Gene expression databases

BgeeiO95232.
ExpressionAtlasiO95232. baseline and differential.
GenevestigatoriO95232.

Organism-specific databases

HPAiHPA018475.
HPA018484.
HPA020017.

Interactioni

Subunit structurei

May interact with SFRS1 and form homodimers. Interacts with JMJD6 and RBM25. Interacts with RSRC1 (via Arg/Ser-rich domain).4 Publications

Protein-protein interaction databases

BioGridi119710. 47 interactions.
IntActiO95232. 6 interactions.
MINTiMINT-1683180.
STRINGi9606.ENSP00000240304.

Structurei

3D structure databases

ProteinModelPortaliO95232.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Coiled coil

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Coiled coili124 – 18158Sequence AnalysisAdd
BLAST

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi228 – 28255Glu-richAdd
BLAST
Compositional biasi235 – 395161Arg/Ser-richAdd
BLAST

Sequence similaritiesi

Belongs to the Luc7 family.Curated

Keywords - Domaini

Coiled coil

Phylogenomic databases

eggNOGiCOG5200.
GeneTreeiENSGT00730000110670.
HOGENOMiHOG000215956.
HOVERGENiHBG062167.
InParanoidiO95232.
OrthoDBiEOG78H3TX.
PhylomeDBiO95232.
TreeFamiTF354312.

Family and domain databases

InterProiIPR004882. Luc7-rel.
[Graphical view]
PANTHERiPTHR12375. PTHR12375. 1 hit.
PfamiPF03194. LUC7. 1 hit.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: O95232-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MISAAQLLDE LMGRDRNLAP DEKRSNVRWD HESVCKYYLC GFCPAELFTN
60 70 80 90 100
TRSDLGPCEK IHDENLRKQY EKSSRFMKVG YERDFLRYLQ SLLAEVERRI
110 120 130 140 150
RRGHARLALS QNQQSSGAAG PTGKNEEKIQ VLTDKIDVLL QQIEELGSEG
160 170 180 190 200
KVEEAQGMMK LVEQLKEERE LLRSTTSTIE SFAAQEKQME VCEVCGAFLI
210 220 230 240 250
VGDAQSRVDD HLMGKQHMGY AKIKATVEEL KEKLRKRTEE PDRDERLKKE
260 270 280 290 300
KQEREEREKE REREREERER KRRREEEERE KERARDRERR KRSRSRSRHS
310 320 330 340 350
SRTSDRRCSR SRDHKRSRSR ERRRSRSRDR RRSRSHDRSE RKHRSRSRDR
360 370 380 390 400
RRSKSRDRKS YKHRSKSRDR EQDRKSKEKE KRGSDDKKSS VKSGSREKQS
410 420 430
EDTNTESKES DTKNEVNGTS EDIKSEGDTQ SN
Length:432
Mass (Da):51,466
Last modified:May 2, 2006 - v2
Checksum:iE75F55EC0137310C
GO
Isoform 2 (identifier: O95232-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     56-79: GPCEKIHDENLRKQYEKSSRFMKV → DVFGRGDNISDVSKFLEDDKWMEE
     80-432: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay. No experimental confirmation available.

Show »
Length:79
Mass (Da):9,194
Checksum:iD204FE4F76DF2FD2
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti217 – 2171H → Y in BAA91981 (PubMed:14702039).Curated
Sequence conflicti378 – 3792EK → HE in AAC79807 (PubMed:10754390).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei56 – 7924GPCEK…RFMKV → DVFGRGDNISDVSKFLEDDK WMEE in isoform 2. 1 PublicationVSP_018136Add
BLAST
Alternative sequencei80 – 432353Missing in isoform 2. 1 PublicationVSP_018137Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB034205 mRNA. Translation: BAA90542.1.
DQ013876 mRNA. Translation: AAY26238.1.
AK001925 mRNA. Translation: BAA91981.1.
AK023672 mRNA. Translation: BAG51216.1.
CH471109 Genomic DNA. Translation: EAW94583.1.
CH471109 Genomic DNA. Translation: EAW94585.1.
CH471109 Genomic DNA. Translation: EAW94587.1.
BC056409 mRNA. Translation: AAH56409.1.
AF069250 mRNA. Translation: AAC79807.1. Sequence problems.
CCDSiCCDS11573.1. [O95232-1]
RefSeqiNP_006098.2. NM_006107.3. [O95232-1]
NP_057508.2. NM_016424.4. [O95232-1]
UniGeneiHs.130293.

Genome annotation databases

EnsembliENST00000240304; ENSP00000240304; ENSG00000108848. [O95232-1]
ENST00000505658; ENSP00000425092; ENSG00000108848. [O95232-1]
GeneIDi51747.
KEGGihsa:51747.
UCSCiuc002isr.3. human. [O95232-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB034205 mRNA. Translation: BAA90542.1.
DQ013876 mRNA. Translation: AAY26238.1.
AK001925 mRNA. Translation: BAA91981.1.
AK023672 mRNA. Translation: BAG51216.1.
CH471109 Genomic DNA. Translation: EAW94583.1.
CH471109 Genomic DNA. Translation: EAW94585.1.
CH471109 Genomic DNA. Translation: EAW94587.1.
BC056409 mRNA. Translation: AAH56409.1.
AF069250 mRNA. Translation: AAC79807.1. Sequence problems.
CCDSiCCDS11573.1. [O95232-1]
RefSeqiNP_006098.2. NM_006107.3. [O95232-1]
NP_057508.2. NM_016424.4. [O95232-1]
UniGeneiHs.130293.

3D structure databases

ProteinModelPortaliO95232.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi119710. 47 interactions.
IntActiO95232. 6 interactions.
MINTiMINT-1683180.
STRINGi9606.ENSP00000240304.

PTM databases

PhosphoSiteiO95232.

Proteomic databases

MaxQBiO95232.
PaxDbiO95232.
PeptideAtlasiO95232.
PRIDEiO95232.

Protocols and materials databases

DNASUi51747.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000240304; ENSP00000240304; ENSG00000108848. [O95232-1]
ENST00000505658; ENSP00000425092; ENSG00000108848. [O95232-1]
GeneIDi51747.
KEGGihsa:51747.
UCSCiuc002isr.3. human. [O95232-1]

Organism-specific databases

CTDi51747.
GeneCardsiGC17P048796.
HGNCiHGNC:24309. LUC7L3.
HPAiHPA018475.
HPA018484.
HPA020017.
MIMi609434. gene.
neXtProtiNX_O95232.
PharmGKBiPA165432062.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiCOG5200.
GeneTreeiENSGT00730000110670.
HOGENOMiHOG000215956.
HOVERGENiHBG062167.
InParanoidiO95232.
OrthoDBiEOG78H3TX.
PhylomeDBiO95232.
TreeFamiTF354312.

Miscellaneous databases

ChiTaRSiLUC7L3. human.
GeneWikiiCROP_(gene).
GenomeRNAii51747.
NextBioi55830.
PROiO95232.
SOURCEiSearch...

Gene expression databases

BgeeiO95232.
ExpressionAtlasiO95232. baseline and differential.
GenevestigatoriO95232.

Family and domain databases

InterProiIPR004882. Luc7-rel.
[Graphical view]
PANTHERiPTHR12375. PTHR12375. 1 hit.
PfamiPF03194. LUC7. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "CROP/Luc7A, a novel serine/arginine-rich nuclear protein, isolated from cisplatin-resistant cell line."
    Nishii Y., Morishima M., Kakehi Y., Umehara K., Kioka N., Terano Y., Amachi T., Ueda K.
    FEBS Lett. 465:153-156(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), SUBCELLULAR LOCATION, TISSUE SPECIFICITY.
  2. "Identification of a family of DNA-binding proteins with homology to RNA splicing factors."
    Shipman K.L., Robinson P.J., King B.R., Smith R., Nicholson R.C.
    Biochem. Cell Biol. 84:9-19(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, TISSUE SPECIFICITY.
    Tissue: Placenta.
  3. "Complete sequencing and characterization of 21,243 full-length human cDNAs."
    Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.
    , Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K., Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A., Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M., Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y., Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M., Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K., Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S., Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J., Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y., Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N., Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S., Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S., Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O., Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H., Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B., Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y., Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T., Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y., Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S., Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T., Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M., Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T., Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K., Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R., Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.
    Nat. Genet. 36:40-45(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Placenta.
  4. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  5. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
    Tissue: PNS.
  6. "Identification of okadaic-acid-induced genes by mRNA differential display in glioma cells."
    Chin L.S., Singh S.K., Wang Q., Murray S.F.
    J. Biomed. Sci. 7:152-159(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 378-432.
    Tissue: Fetal brain.
  7. "Effect of cisplatin treatment on speckled distribution of a serine/arginine-rich nuclear protein CROP/Luc7A."
    Umehara H., Nishii Y., Morishima M., Kakehi Y., Kioka N., Amachi T., Koizumi J., Hagiwara M., Ueda K.
    Biochem. Biophys. Res. Commun. 301:324-329(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH SFRS1, PHOSPHORYLATION, SUBCELLULAR LOCATION.
  8. "A novel SR-related protein is required for the second step of pre-mRNA splicing."
    Cazalla D., Newton K., Caceres J.F.
    Mol. Cell. Biol. 25:2969-2980(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH RSRC1.
  9. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-110, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Embryonic kidney.
  10. "Novel splicing factor RBM25 modulates Bcl-x pre-mRNA 5' splice site selection."
    Zhou A., Ou A.C., Cho A., Benz E.J. Jr., Huang S.C.
    Mol. Cell. Biol. 28:5924-5936(2008) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH RBM25.
  11. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-425 AND SER-431, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  12. "Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach."
    Gauci S., Helbig A.O., Slijper M., Krijgsveld J., Heck A.J., Mohammed S.
    Anal. Chem. 81:4493-4501(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT MET-1, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  13. "Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions."
    Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K., Rodionov V., Han D.K.
    Sci. Signal. 2:RA46-RA46(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-425, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Leukemic T-cell.
  14. "Lysine acetylation targets protein complexes and co-regulates major cellular functions."
    Choudhary C., Kumar C., Gnad F., Nielsen M.L., Rehman M., Walther T.C., Olsen J.V., Mann M.
    Science 325:834-840(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT LYS-231, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  15. Cited for: INTERACTION WITH JMJD6.
  16. "Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis."
    Olsen J.V., Vermeulen M., Santamaria A., Kumar C., Miller M.L., Jensen L.J., Gnad F., Cox J., Jensen T.S., Nigg E.A., Brunak S., Mann M.
    Sci. Signal. 3:RA3-RA3(2010) [PubMed] [Europe PMC] [Abstract]
    Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT MET-1, PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-3; SER-425 AND SER-431, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  17. Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  18. "System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation."
    Rigbolt K.T., Prokhorova T.A., Akimov V., Henningsen J., Johansen P.T., Kratchmarova I., Kassem M., Mann M., Olsen J.V., Blagoev B.
    Sci. Signal. 4:RS3-RS3(2011) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-420 AND SER-425, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  19. "Comparative large-scale characterisation of plant vs. mammal proteins reveals similar and idiosyncratic N-alpha acetylation features."
    Bienvenut W.V., Sumpton D., Martinez A., Lilla S., Espagne C., Meinnel T., Giglione C.
    Mol. Cell. Proteomics 11:M111.015131-M111.015131(2012) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  20. Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT MET-1, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  21. "An enzyme assisted RP-RPLC approach for in-depth analysis of human liver phosphoproteome."
    Bian Y., Song C., Cheng K., Dong M., Wang F., Huang J., Sun D., Wang L., Ye M., Zou H.
    J. Proteomics 96:253-262(2014) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Liver.

Entry informationi

Entry nameiLC7L3_HUMAN
AccessioniPrimary (citable) accession number: O95232
Secondary accession number(s): B3KN54
, D3DTY1, Q6PHR9, Q9NUY0, Q9P2S7
Entry historyi
Integrated into UniProtKB/Swiss-Prot: May 30, 2000
Last sequence update: May 2, 2006
Last modified: March 4, 2015
This is version 109 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 17
    Human chromosome 17: entries, gene names and cross-references to MIM
  2. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  3. SIMILARITY comments
    Index of protein domains and families

External Data

Dasty 3

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.