Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Nuclear autoantigenic sperm protein

Gene

Nasp

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Required for DNA replication, normal cell cycle progression and cell proliferation. Forms a cytoplasmic complex with HSP90 and linker H1 histones and stimulates HSP90 ATPase activity. NASP and H1 histone are subsequently released from the complex and translocate to the nucleus where the histone is released for binding to DNA.3 Publications

GO - Molecular functioni

  • histone binding Source: MGI
  • Hsp90 protein binding Source: UniProtKB
  • protein complex binding Source: MGI

GO - Biological processi

  • blastocyst development Source: UniProtKB
  • cell cycle Source: UniProtKB
  • cell proliferation Source: UniProtKB
  • DNA replication Source: UniProtKB
  • DNA replication-dependent nucleosome assembly Source: MGI
  • DNA replication-independent nucleosome assembly Source: MGI
  • histone exchange Source: UniProtKB
  • nucleosome assembly Source: MGI
  • protein transport Source: UniProtKB-KW
Complete GO annotation...

Keywords - Biological processi

Cell cycle, DNA replication, Protein transport, Transport

Names & Taxonomyi

Protein namesi
Recommended name:
Nuclear autoantigenic sperm protein
Short name:
NASP
Gene namesi
Name:NaspImported
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Chromosome 4

Organism-specific databases

MGIiMGI:1355328. Nasp.

Subcellular locationi

GO - Cellular componenti

  • cytoplasm Source: UniProtKB-SubCell
  • nuclear chromatin Source: MGI
  • nucleoplasm Source: MGI
  • nucleus Source: MGI
  • protein complex Source: MGI
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Disruption phenotypei

Mice develop to blastocyst stage, probably as a result of maternally-derived Nasp, and then die.1 Publication

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Initiator methionineiRemovedBy similarity
Chaini2 – 773772Nuclear autoantigenic sperm proteinPRO_0000261597Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei2 – 21N-acetylalanineBy similarity
Modified residuei33 – 331N6-acetyllysineBy similarity
Modified residuei123 – 1231PhosphothreonineBy similarity
Modified residuei127 – 1271PhosphoserineBy similarity
Modified residuei188 – 1881PhosphoserineBy similarity
Modified residuei241 – 2411N6-acetyllysineCombined sources
Modified residuei242 – 2421PhosphoserineBy similarity
Modified residuei251 – 2511N6-acetyllysineCombined sources
Modified residuei284 – 2841N6-acetyllysineCombined sources
Modified residuei304 – 3041PhosphoserineBy similarity
Modified residuei319 – 3191PhosphoserineBy similarity
Modified residuei377 – 3771PhosphothreonineBy similarity
Modified residuei395 – 3951PhosphoserineBy similarity
Modified residuei396 – 3961PhosphoserineBy similarity
Modified residuei450 – 4501PhosphothreonineBy similarity
Modified residuei463 – 4631PhosphothreonineCombined sources
Modified residuei466 – 4661PhosphoserineCombined sources
Modified residuei483 – 4831PhosphoserineBy similarity
Modified residuei489 – 4891PhosphoserineBy similarity
Modified residuei669 – 6691PhosphothreonineBy similarity
Modified residuei712 – 7121PhosphoserineBy similarity
Modified residuei737 – 7371PhosphoserineBy similarity

Keywords - PTMi

Acetylation, Phosphoprotein

Proteomic databases

EPDiQ99MD9.
MaxQBiQ99MD9.
PaxDbiQ99MD9.
PRIDEiQ99MD9.

2D gel databases

REPRODUCTION-2DPAGEQ99MD9.

PTM databases

iPTMnetiQ99MD9.
SwissPalmiQ99MD9.

Expressioni

Tissue specificityi

Isoform 1 is found in gametes, embryonic cells and transformed cells. Isoform 2 is found in dividing somatic cells (at protein level).1 Publication

Developmental stagei

During the cell cycle, levels increase during S-phase.1 Publication

Gene expression databases

BgeeiQ99MD9.
ExpressionAtlasiQ99MD9. baseline and differential.
GenevisibleiQ99MD9. MM.

Interactioni

Subunit structurei

Binds to linker H1 histones but not to core histones. Also binds to HSP90 in the cytoplasm. This interaction stimulates binding of NASP to HIST1H1T/H1T.3 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
Hist1h1cP158643EBI-913410,EBI-913436

GO - Molecular functioni

  • histone binding Source: MGI
  • Hsp90 protein binding Source: UniProtKB
  • protein complex binding Source: MGI

Protein-protein interaction databases

BioGridi206159. 2 interactions.
IntActiQ99MD9. 8 interactions.
MINTiMINT-4104750.
STRINGi10090.ENSMUSP00000030456.

Structurei

3D structure databases

ProteinModelPortaliQ99MD9.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Repeati43 – 7634TPR 1Sequence analysisAdd
BLAST
Repeati237 – 27034TPR 2Sequence analysisAdd
BLAST
Repeati528 – 56134TPR 3Sequence analysisAdd
BLAST
Repeati570 – 60334TPR 4Sequence analysisAdd
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni116 – 12712Histone-bindingBy similarityAdd
BLAST
Regioni210 – 24233Histone-bindingBy similarityAdd
BLAST
Regioni455 – 49844Histone-bindingBy similarityAdd
BLAST

Coiled coil

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Coiled coili126 – 16035Sequence analysisAdd
BLAST
Coiled coili595 – 64854Sequence analysisAdd
BLAST

Motif

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Motifi702 – 7087Nuclear localization signalSequence analysis

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi111 – 644534Glu-richSequence analysisAdd
BLAST

Sequence similaritiesi

Belongs to the NASP family.Curated
Contains 4 TPR repeats.PROSITE-ProRule annotation

Keywords - Domaini

Coiled coil, Repeat, TPR repeat

Phylogenomic databases

eggNOGiKOG4563. Eukaryota.
ENOG4110P5E. LUCA.
GeneTreeiENSGT00390000016650.
HOGENOMiHOG000013120.
HOVERGENiHBG002186.
InParanoidiQ99MD9.
KOiK11291.
PhylomeDBiQ99MD9.

Family and domain databases

Gene3Di1.25.40.10. 2 hits.
InterProiIPR019544. Tetratricopeptide_SHNi-TPR_dom.
IPR013026. TPR-contain_dom.
IPR011990. TPR-like_helical_dom.
IPR019734. TPR_repeat.
[Graphical view]
PfamiPF10516. SHNi-TPR. 1 hit.
[Graphical view]
SMARTiSM00028. TPR. 3 hits.
[Graphical view]
SUPFAMiSSF48452. SSF48452. 1 hit.
PROSITEiPS50005. TPR. 3 hits.
PS50293. TPR_REGION. 2 hits.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 12 Publications (identifier: Q99MD9-1) [UniParc]FASTAAdd to basket

Also known as: Testicular NASP1 Publication

, tNASP1 Publication

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MATESTAAAA IAAELVSADK IEDAPAPSTS ADKMESLDVD SEAKKLLGLG
60 70 80 90 100
QKHLVMGDIP AAVNAFQEAA SLLGKKYGET ANECGEAFFF YGKSLLELAR
110 120 130 140 150
MENGVLGNAL EGVHVEEEEG EKTEDESLVE NNDNVDEEAR EELREQVYDA
160 170 180 190 200
MGEKEAKKAE GKSLTKPETD KEQESEVEKG GREDMDISEP EEKLQETVEP
210 220 230 240 250
TSKQLTESSE EAKEAAIPGL NEDEVASGKT EQESLCTEKG KSISGAYVQN
260 270 280 290 300
KEFRETVEEG EEIISLEKKP KETSEDQPIR AAEKQGTLMK VVEIEAEIDP
310 320 330 340 350
QVKSADVGGE EPKDQVATSE SELGKAVLME LSGQDVEASP VVAAEAGAEV
360 370 380 390 400
SEKPGQEITV IPNNGPVVGQ STVGDQTPSE PQTSAERLTE TKDGSSVEEV
410 420 430 440 450
KAELVPEQEE AMLPVEESEA AGDGVETKVA QKATEKAPED KFKIAANEET
460 470 480 490 500
PERDEQMKEG EETEGSEEED RENDKAEETP NESVLEKKSL QENEEEEIGN
510 520 530 540 550
LELAWDMLDL AKIIFKRQET KEAQLYAAQA HLKLGEVSVE SENYIQAVEE
560 570 580 590 600
FQACLSLQEQ YLEAHDRLLA ETHYQLGLAY GYNSQYDEAV AQFGKSIDVI
610 620 630 640 650
EKRMAVLHEQ MKEAEGSFTE YEKEIEELKE LLPEIREKIE DAKESQRSGN
660 670 680 690 700
VAELALKATL VESSTSGFTP SGAGASVSMI ASRKPTDGAS SSNCVTDISH
710 720 730 740 750
LVRKKRKPEE ESPRKDDAKK AKQEPEVNGG SGDAVSSGKE VSENMEAEAE
760 770
NQAESQTAEG TVESAATIKS TAC
Length:773
Mass (Da):83,954
Last modified:November 28, 2006 - v2
Checksum:i63C5CCA025972390
GO
Isoform 22 Publications (identifier: Q99MD9-2) [UniParc]FASTAAdd to basket

Also known as: Somatic NASP1 Publication

, sNASP1 Publication

The sequence of this isoform differs from the canonical sequence as follows:
     74-100: Missing.
     138-462: Missing.

Show »
Length:421
Mass (Da):45,754
Checksum:i1892256A471A5F19
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti432 – 4321K → R in BAE21303 (PubMed:16141072).Curated
Sequence conflicti451 – 4511P → Q in BAE21303 (PubMed:16141072).Curated
Sequence conflicti462 – 4621Missing in AAK31170 (PubMed:16728391).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei74 – 10027Missing in isoform 2. 3 PublicationsVSP_052237Add
BLAST
Alternative sequencei138 – 462325Missing in isoform 2. 3 PublicationsVSP_052238Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF034610 mRNA. Translation: AAB87567.2.
AF095722 mRNA. Translation: AAC64195.1.
AF349432 Genomic DNA. Translation: AAK31170.1.
AF349432 Genomic DNA. Translation: AAK31171.1.
AK083333 mRNA. Translation: BAC38871.1.
AK132690 mRNA. Translation: BAE21303.1.
BC004693 mRNA. Translation: AAH04693.1.
CCDSiCCDS18513.1. [Q99MD9-1]
CCDS71451.1. [Q99MD9-2]
RefSeqiNP_001074944.1. NM_001081475.1.
NP_001271158.1. NM_001284229.1. [Q99MD9-2]
UniGeneiMm.257181.

Genome annotation databases

EnsembliENSMUST00000081182; ENSMUSP00000079946; ENSMUSG00000028693. [Q99MD9-2]
GeneIDi50927.
KEGGimmu:50927.
UCSCiuc008ugw.1. mouse. [Q99MD9-2]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF034610 mRNA. Translation: AAB87567.2.
AF095722 mRNA. Translation: AAC64195.1.
AF349432 Genomic DNA. Translation: AAK31170.1.
AF349432 Genomic DNA. Translation: AAK31171.1.
AK083333 mRNA. Translation: BAC38871.1.
AK132690 mRNA. Translation: BAE21303.1.
BC004693 mRNA. Translation: AAH04693.1.
CCDSiCCDS18513.1. [Q99MD9-1]
CCDS71451.1. [Q99MD9-2]
RefSeqiNP_001074944.1. NM_001081475.1.
NP_001271158.1. NM_001284229.1. [Q99MD9-2]
UniGeneiMm.257181.

3D structure databases

ProteinModelPortaliQ99MD9.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi206159. 2 interactions.
IntActiQ99MD9. 8 interactions.
MINTiMINT-4104750.
STRINGi10090.ENSMUSP00000030456.

PTM databases

iPTMnetiQ99MD9.
SwissPalmiQ99MD9.

2D gel databases

REPRODUCTION-2DPAGEQ99MD9.

Proteomic databases

EPDiQ99MD9.
MaxQBiQ99MD9.
PaxDbiQ99MD9.
PRIDEiQ99MD9.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENSMUST00000081182; ENSMUSP00000079946; ENSMUSG00000028693. [Q99MD9-2]
GeneIDi50927.
KEGGimmu:50927.
UCSCiuc008ugw.1. mouse. [Q99MD9-2]

Organism-specific databases

CTDi4678.
MGIiMGI:1355328. Nasp.

Phylogenomic databases

eggNOGiKOG4563. Eukaryota.
ENOG4110P5E. LUCA.
GeneTreeiENSGT00390000016650.
HOGENOMiHOG000013120.
HOVERGENiHBG002186.
InParanoidiQ99MD9.
KOiK11291.
PhylomeDBiQ99MD9.

Miscellaneous databases

ChiTaRSiNasp. mouse.
NextBioi307937.
PROiQ99MD9.
SOURCEiSearch...

Gene expression databases

BgeeiQ99MD9.
ExpressionAtlasiQ99MD9. baseline and differential.
GenevisibleiQ99MD9. MM.

Family and domain databases

Gene3Di1.25.40.10. 2 hits.
InterProiIPR019544. Tetratricopeptide_SHNi-TPR_dom.
IPR013026. TPR-contain_dom.
IPR011990. TPR-like_helical_dom.
IPR019734. TPR_repeat.
[Graphical view]
PfamiPF10516. SHNi-TPR. 1 hit.
[Graphical view]
SMARTiSM00028. TPR. 3 hits.
[Graphical view]
SUPFAMiSSF48452. SSF48452. 1 hit.
PROSITEiPS50005. TPR. 3 hits.
PS50293. TPR_REGION. 2 hits.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Characterization of the histone H1-binding protein, NASP, as a cell cycle-regulated somatic protein."
    Richardson R.T., Batova I.N., Widgren E.E., Zheng L.-X., Whitfield M., Marzluff W.F., O'Rand M.G.
    J. Biol. Chem. 275:30378-30386(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS 1 AND 2), INTERACTION WITH HISTONES, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE.
    Strain: BALB/cJImported.
    Tissue: OvaryImported, Spleen1 Publication and Testis1 Publication.
  2. "Nuclear autoantigenic sperm protein (NASP), a linker histone chaperone that is required for cell proliferation."
    Richardson R.T., Alekseev O.M., Grossman G., Widgren E.E., Thresher R., Wagner E.J., Sullivan K.D., Marzluff W.F., O'Rand M.G.
    J. Biol. Chem. 281:21526-21534(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA] (ISOFORMS 1 AND 2), FUNCTION, SUBCELLULAR LOCATION, DISRUPTION PHENOTYPE.
    Strain: 129/SvJImported.
  3. "The transcriptional landscape of the mammalian genome."
    Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V.B., Brenner S.E., Batalov S., Forrest A.R., Zavolan M., Davis M.J.
    , Wilming L.G., Aidinis V., Allen J.E., Ambesi-Impiombato A., Apweiler R., Aturaliya R.N., Bailey T.L., Bansal M., Baxter L., Beisel K.W., Bersano T., Bono H., Chalk A.M., Chiu K.P., Choudhary V., Christoffels A., Clutterbuck D.R., Crowe M.L., Dalla E., Dalrymple B.P., de Bono B., Della Gatta G., di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C.F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T.R., Gojobori T., Green R.E., Gustincich S., Harbers M., Hayashi Y., Hensch T.K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S.P., Kruger A., Kummerfeld S.K., Kurochkin I.V., Lareau L.F., Lazarevic D., Lipovich L., Liu J., Liuni S., McWilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Nori F., Ohara O., Okazaki Y., Orlando V., Pang K.C., Pavan W.J., Pavesi G., Pesole G., Petrovsky N., Piazza S., Reed J., Reid J.F., Ring B.Z., Ringwald M., Rost B., Ruan Y., Salzberg S.L., Sandelin A., Schneider C., Schoenbach C., Sekiguchi K., Semple C.A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S.L., Tang S., Taylor M.S., Tegner J., Teichmann S.A., Ueda H.R., van Nimwegen E., Verardo R., Wei C.L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S.M., Teasdale R.D., Liu E.T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J.S., Hume D.A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y.
    Science 309:1559-1563(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2), NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 1-453 (ISOFORM 1).
    Strain: C57BL/6JImported.
    Tissue: TestisImported and ThymusImported.
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
    Strain: FVB/NImported.
    Tissue: Mammary glandImported.
  5. "Overexpression of the linker histone-binding protein tNASP affects progression through the cell cycle."
    Alekseev O.M., Bencic D.C., Richardson R.T., Widgren E.E., O'Rand M.G.
    J. Biol. Chem. 278:8846-8852(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, SUBCELLULAR LOCATION, INTERACTION WITH HISTONES.
  6. "Association of NASP with HSP90 in mouse spermatogenic cells: stimulation of ATPase activity and transport of linker histones into nuclei."
    Alekseev O.M., Widgren E.E., Richardson R.T., O'Rand M.G.
    J. Biol. Chem. 280:2904-2911(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, INTERACTION WITH HIST1H1T AND HSP90.
  7. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT THR-463 AND SER-466, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Brain, Brown adipose tissue, Heart, Kidney, Liver, Lung, Pancreas, Spleen and Testis.
  8. "SIRT5-mediated lysine desuccinylation impacts diverse metabolic pathways."
    Park J., Chen Y., Tishkoff D.X., Peng C., Tan M., Dai L., Xie Z., Zhang Y., Zwaans B.M., Skinner M.E., Lombard D.B., Zhao Y.
    Mol. Cell 50:919-930(2013) [PubMed] [Europe PMC] [Abstract]
    Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT LYS-241; LYS-251 AND LYS-284, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Embryonic fibroblast.

Entry informationi

Entry nameiNASP_MOUSE
AccessioniPrimary (citable) accession number: Q99MD9
Secondary accession number(s): O35499
, O88993, Q3V150, Q99KE9
Entry historyi
Integrated into UniProtKB/Swiss-Prot: November 28, 2006
Last sequence update: November 28, 2006
Last modified: May 11, 2016
This is version 112 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.