Skip Header

 
Contribute Send feedback
Read comments (0) or add your own

Reviewed, UniProtKB/Swiss-Prot P17140 (CO4A2_CAEEL)

Last modified November 3, 2009. Version 100. Feed History...

Clusters with 100%, 90%, 50% identity | Documents (2) | Third-party data | Customize display text xml rdf/xml gff fasta
Names and origin · Protein attributes · General annotation (Comments) · Ontologies · Alternative products · Sequence annotation (Features) · Sequences · References · Cross-references · Entry information · Relevant documents

Names and origin

Protein namesRecommended name:
    Collagen alpha-2(IV) chain
Alternative name(s):
    Lethal protein 2
Gene names
Name: let-2
Synonyms: clb-1
ORF Names: F01G12.5
OrganismCaenorhabditis elegans [Complete proteome]
Taxonomic identifier6239 [NCBI]
Taxonomic lineageEukaryotaMetazoaNematodaChromadoreaRhabditidaRhabditoideaRhabditidaePeloderinaeCaenorhabditis

Protein attributes

Sequence length1758 AA.
Sequence statusComplete.
Sequence processingThe displayed sequence is further processed into a mature form.
Protein existenceEvidence at protein level.

General annotation (Comments)

Function

Collagen type IV is specific for basement membranes. Vital for embryonic development. Ref.1

Subunit structure

Trimers of two alpha 1(IV) and one alpha 2(IV) chain. Type IV collagen forms a mesh-like network linked through intermolecular interactions between 7S domains and between NC1 domains.

Subcellular location

Secretedextracellular spaceextracellular matrixbasement membrane.

Developmental stage

Isoform I is predominant in embryos and isoform II is predominant in the larvae and adults.

Domain

Alpha chains of type IV collagen have a non-collagenous domain (NC1) at their C-terminus, frequent interruptions of the G-X-Y repeats in the long central triple-helical domain (which may cause flexibility in the triple helix), and a short N-terminal triple-helical 7S domain.

Post-translational modification

Prolines at the third position of the tripeptide repeating unit (G-X-Y) are hydroxylated in some or all of the chains.

Type IV collagens contain numerous cysteine residues which are involved in inter- and intramolecular disulfide bonding. 12 of these, located in the NC1 domain, are conserved in all known type IV collagens.

The trimeric structure of the NC1 domains is stabilized by covalent bonds between Lys and Met residues By similarity.

Sequence similarities

Belongs to the type IV collagen family.

Contains 1 collagen IV NC1 (C-terminal non-collagenous) domain.

Alternative products

This entry describes 2 isoforms produced by alternative splicing. [Align] [Select]
Isoform I (identifier: P17140-1)

Also known as: a;

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.
Isoform II (identifier: P17140-2)

Also known as: b;

The sequence of this isoform differs from the canonical sequence as follows:
     229-264: GDLGSVGPPGPPGPREFTGSGSIVGPRGNPGEKGDK → GDIGAMGPAGPPGPIASTMSKGTIIGPKGDLGEKGEK

Sequence annotation (Features)

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifier

Molecule processing

Signal peptide1 – 2626 Potential
Chain27 – 17581732Collagen alpha-2(IV) chain
PRO_0000005829

Regions

Domain1531 – 1754224Collagen IV NC1
Region27 – 42167S domain
Region42 – 15271486Triple-helical region

Amino acid modifications

Glycosylation2481O-linked (Xyl...) (glycosaminoglycan) Potential
Disulfide bond1546 ↔ 1635 By similarity
Disulfide bond1579 ↔ 1632 By similarity
Disulfide bond1591 ↔ 1597 By similarity
Disulfide bond1654 ↔ 1750 By similarity
Disulfide bond1688 ↔ 1747 By similarity
Disulfide bond1700 ↔ 1707 By similarity

Natural variations

Alternative sequence229 – 26436GDLGS…EKGDK → GDIGAMGPAGPPGPIASTMS KGTIIGPKGDLGEKGEK in isoform II.
VSP_001160

Experimental info

Mutagenesis481G → E in MN114; 73% lethal.
Mutagenesis3661A → T in MN126; 100% lethal.
Mutagenesis5701G → E in MN109; 37% lethal.
Mutagenesis5881G → R in MN103 and MN151; 96% lethal.
Mutagenesis5971G → R in MN152; 50% lethal.
Mutagenesis6901G → E in MN129; 100% lethal.
Mutagenesis6901G → R in MN101; 100% lethal.
Mutagenesis7371G → E in MN143; 100% lethal.
Mutagenesis8771G → R in G30; 90% lethal.
Mutagenesis9041G → R in E1470; 94% lethal.
Mutagenesis10031G → E in MN139; 20% lethal.
Mutagenesis11251G → D in G25; 2% lethal.
Mutagenesis11521G → D in MN147; 7% lethal.
Mutagenesis12861G → D in G37 and B246; 9% lethal.
Sequence conflict16041E → D in AAA96215. Ref.2
Sequence conflict16041E → D in AAA96216. Ref.2
Sequence conflict16821P → L in AAA96215. Ref.2
Sequence conflict16821P → L in AAA96216. Ref.2

Sequences

Sequence LengthMass (Da)Tools
Isoform I (a) [UniParc].

Last modified October 1, 1994. Version 2.
Checksum: 97EE3F3DBB2D2AC5

FASTA1,758167,751
        10         20         30         40         50         60 
MKQRAALGPV LRLAILALLA VSYVQSQATC RDCSNRGCFC VGEKGSMGAP GPQGPPGTQG 

        70         80         90        100        110        120 
IRGFPGPEGL AGPKGLKGAQ GPPGPVGIKG DRGAVGVPGF PGNDGGNGRP GEPGPPGAPG 

       130        140        150        160        170        180 
WDGCNGTDGA PGIPGRPGPP GMPGFPGPPG MDGLKGEPAI GYAGAPGEKG DGGMPGMPGL 

       190        200        210        220        230        240 
PGPSGRDGYP GEKGDRGDTG NAGPRGPPGE AGSPGNPGIG SIGPKGDPGD LGSVGPPGPP 

       250        260        270        280        290        300 
GPREFTGSGS IVGPRGNPGE KGDKGEPGEG GQRGYPGNGG LSGQPGLPGM KGEKGLSGPA 

       310        320        330        340        350        360 
GPRGKEGRPG NAGPPGFKGD RGLDGLGGIP GLPGQKGEAG YPGRDGPKGN SGPPGPPGGG 

       370        380        390        400        410        420 
TFNDGAPGPP GLPGRPGNPG PPGTDGYPGA PGPAGPIGNT GGPGLPGYPG NEGLPGPKGD 

       430        440        450        460        470        480 
KGDGGIPGAP GVSGPSGIPG LPGPKGEPGY RGTPGQSIPG LPGKDGKPGL DGAPGRKGEN 

       490        500        510        520        530        540 
GLPGVRGPPG DSLNGLPGAP GQRGAPGPNG YDGRDGVNGL PGAPGTKGDR GGTCSACAPG 

       550        560        570        580        590        600 
TKGEKGLPGY SGQPGPQGDR GLPGMPGPVG DAGDDGLPGP AGRPGSPGPP GQDGFPGLPG 

       610        620        630        640        650        660 
QKGEPTQLTL RPGPPGYPGL KGENGFPGQP GVDGLPGPSG PVGPPGAPGY PGEKGDAGLP 

       670        680        690        700        710        720 
GLSGKPGQDG LPGLPGNKGE AGYGQPGQPG FPGAKGDGGL PGLPGTPGLQ GMPGEPAPEN 

       730        740        750        760        770        780 
QVNPAPPGQP GLPGLPGTKG EGGYPGRPGE VGQPGFPGLP GMKGDSGLPG PPGLPGHPGV 

       790        800        810        820        830        840 
PGDKGFGGVP GLPGIPGPKG DVGNPGLPGL NGQKGEPGVG VPGQPGSPGF PGLKGDAGLP 

       850        860        870        880        890        900 
GLPGTPGLEG QRGFPGAPGL KGGDGLPGLS GQPGYPGEKG DAGLPGVPGR EGSPGFPGQD 

       910        920        930        940        950        960 
GLPGVPGMKG EDGLPGLPGV TGLKGDLGAP GQSGAPGLPG APGYPGMKGN AGIPGVPGFK 

       970        980        990       1000       1010       1020 
GDGGLPGLPG LNGPKGEPGV PGMPGTPGMK GNGGLPGLPG RDGLSGVPGM KGDRGFNGLP 

      1030       1040       1050       1060       1070       1080 
GEKGEAGPAA RDGQKGDAGL PGQPGLRGPQ GPSGLPGVPG FKGETGLPGY GQPGQPGEKG 

      1090       1100       1110       1120       1130       1140 
LPGIPGKAGR QGAPGSPGQD GLPGFPGMKG ESGYPGQDGL PGRDGLPGVP GQKGDLGQSG 

      1150       1160       1170       1180       1190       1200 
QPGLSGAPGL DGQPGVPGIR GDKGQGGLPG IPGDRGMDGY PGQKGENGYP GQPGLPGLGG 

      1210       1220       1230       1240       1250       1260 
EKGFAGTPGF PGLKGSPGYP GQDGLPGIPG LKGDSGFPGQ PGQEGLPGLS GEKGMGGLPG 

      1270       1280       1290       1300       1310       1320 
MPGQPGQSIA GPVGPPGAPG LQGKDGFPGL PGQKGESGLS GLPGAPGLKG ESGMPGFPGA 

      1330       1340       1350       1360       1370       1380 
KGDLGANGIP GKRGEDGLPG VPGRDGQPGI PGLKGEVGGA GLPGQPGFPG IPGLKGEGGL 

      1390       1400       1410       1420       1430       1440 
PGFPGAKGEA GFPGTPGVPG YAGEKGDGGL PGLPGRDGLP GADGPVGPPG PSGPQNLVEP 

      1450       1460       1470       1480       1490       1500 
GEKGLPGLPG APGLRGEKGM PGLDGPPGND GPPGLPGQRG NDGYPGAPGL SGEKGMGGLP 

      1510       1520       1530       1540       1550       1560 
GFPGLDGQPG GPGAPGLPGA PGAAGPAYRD GFVLVKHSQT TEVPRCPEGQ TKLWDGYSLL 

      1570       1580       1590       1600       1610       1620 
YIEGNEKSHN QDLGHAGSCL QRFSTMPFLF CDFNNVCNYA SRNEKSYWLS TSEAIPMMPV 

      1630       1640       1650       1660       1670       1680 
NEREIEPYIS RCAVCEAPAN TIAVHSQTIQ IPNCPAGWSS LWIGYSFAMH TGAGAEGGGQ 

      1690       1700       1710       1720       1730       1740 
SPSSPGSCLE DFRATPFIEC NGARGSCHYF ANKFSFWLTT IDNDSEFKVP ESQTLKSGNL 

      1750 
RTRVSRCQVC VKSTDGRH 

« Hide

Isoform II (b).

Checksum: 816D72C403FB81CB
Show »

FASTA1,759167,813

References

« Hide 'large scale' references
[1]"Genetic identification, sequence, and alternative splicing of the Caenorhabditis elegans alpha 2(IV) collagen gene."
Sibley M.H., Johnson J.J., Mello C.C., Kramer J.M.
J. Cell Biol. 123:255-264(1993) [PubMed: 7691828] [Abstract]
Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA], ALTERNATIVE SPLICING, FUNCTION.
Strain: Bristol N2.
[2]"Genome sequence of the nematode C. elegans: a platform for investigating biology."
The C. elegans sequencing consortium
Science 282:2012-2018(1998) [PubMed: 9851916] [Abstract]
Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA], ALTERNATIVE SPLICING.
Strain: Bristol N2.
[3]"The two Caenorhabditis elegans basement membrane (type IV) collagen genes are located on separate chromosomes."
Guo X., Kramer J.M.
J. Biol. Chem. 264:17574-17582(1989) [PubMed: 2793871] [Abstract]
Cited for: PRELIMINARY NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1495-1758.
Strain: Bristol N2.
[4]"Mutations in the alpha 2(IV) basement membrane collagen gene of Caenorhabditis elegans produce phenotypes of differing severities."
Sibley M.H., Graham P.L., von Mende N., Kramer J.M.
EMBO J. 13:3278-3285(1994) [PubMed: 8045258] [Abstract]
Cited for: MUTAGENESIS.
+Additional computationally mapped references.

Cross-references

Sequence databases

Z22964 Genomic DNA. Translation: CAA80536.1.
Z22964 Genomic DNA. Translation: CAA80537.1.
U22327 Genomic DNA. Translation: AAA64312.1. Sequence problems.
U53342 Genomic DNA. Translation: AAA96215.1.
U53342 Genomic DNA. Translation: AAA96216.1.
J05066 Genomic DNA. Translation: AAA27989.1.
PIRA34476.
T29350.
T29351.
RefSeqNP_510663.1.
NP_510664.1.
UniGeneCel.17195

3D structure databases

HSSPHSSP built from PDB template 1LI1 based on UniProtKB P08572.
SMRP17140. Positions 1531-1753.
ModBaseSearch...

Protein-protein interaction databases

STRINGP17140.

Proteomic databases

PRIDEP17140.

Genome annotation databases

EnsemblF01G12.5a; F01G12.5a; F01G12.5; Caenorhabditis elegans. [Genome view]
F01G12.5b.1; F01G12.5b.1; F01G12.5; Caenorhabditis elegans. [Genome view]
F01G12.5b.2; F01G12.5b.2; F01G12.5; Caenorhabditis elegans. [Genome view]
GeneID181708.
KEGGcel:F01G12.5.
UCSCF01G12.5b.1. c. elegans.

Organism-specific databases

WormBaseWBGene00002280. let-2.
WormPepF01G12.5a. CE04334. [WorfDB]
F01G12.5b. CE04335. [WorfDB]

Gene expression databases

ArrayExpressP17140.

Family and domain databases

InterProIPR008160. Collagen.
IPR001442. Procollagn4_C.
[Graphical view]
Gene3DG3DSA:2.170.240.10. Procollagn4_C. 1 hit.
PfamPF01413. C4. 2 hits.
PF01391. Collagen. 22 hits.
[Graphical view]
ProDomPD000007. Clg_helix. 14 hits.
PD003923. Procollagn4_C. 2 hits.
PD003992. XGLTT_domain. 2 hits.
[Graphical view] [Entries sharing at least one domain]
SMARTSM00111. C4. 2 hits.
[Graphical view]
PROSITEPS51403. NC1_IV. 1 hit.
[Graphical view]
ProtoNetSearch...

Entry information

Entry nameCO4A2_CAEEL
AccessionPrimary (citable) accession number: P17140
Secondary accession number(s): Q19098, Q19099
Entry history
Integrated into UniProtKB/Swiss-Prot: August 1, 1990
Last sequence update: October 1, 1994
Last modified: November 3, 2009
This is version 100 of the entry and version 2 of the sequence. [Complete history]
Entry statusReviewed (UniProtKB/Swiss-Prot)
Annotation projectCaenorhabditis annotation project

Relevant documents

Caenorhabditis elegans

Caenorhabditis elegans: entries, gene names and cross-references to WormPep

SIMILARITY comments

Index of protein domains and families

Names and origin · Protein attributes · General annotation (Comments) · Ontologies · Alternative products · Sequence annotation (Features) · Sequences · References · Cross-references · Entry information · Relevant documents