Conceived and designed the experiments: APG DPR. Performed the experiments: APG. Analyzed the data: APG WS DPR. Wrote the paper: APG DPR.
The authors have declared that no competing interests exist.
Fibrillins constitute the major backbone of multifunctional microfibrils in elastic and non-elastic extracellular matrices, and are known to interact with several binding partners including tropoelastin and integrins. Here, we study the evolution of fibrillin proteins. Following sequence collection from 39 organisms representative of the major evolutionary groups, molecular evolutionary genetics and phylogeny inference software were used to generate a series of evolutionary trees using distance-based and maximum likelihood methods. The resulting trees support the concept of gene duplication as a means of generating the three vertebrate fibrillins. Beginning with a single fibrillin sequence found in invertebrates and jawless fish, a gene duplication event, which coincides with the appearance of elastin, led to the creation of two genes. One of the genes significantly evolved to become the gene for present-day fibrillin-1, while the other underwent evolutionary changes, including a second duplication, to produce present-day fibrillin-2 and fibrillin-3. Detailed analysis of several sequences and domains within the fibrillins reveals distinct similarities and differences across various species. The RGD integrin-binding site in TB4 of all fibrillins is conserved in cephalochordates and vertebrates, while the integrin-binding site within cbEGF18 of fibrillin-3 is a recent evolutionary change. The proline-rich domain in fibrillin-1, glycine-rich domain in fibrillin-2 and proline-/glycine-rich domain in fibrillin-3 are found in all analyzed tetrapod species, whereas it is completely replaced with an EGF-like domain in cnidarians, arthropods, molluscs and urochordates. All collected sequences contain the first 9-cysteine hybrid domain, and the second 8-cysteine hybrid domain with exception of arthropods containing an atypical 10-cysteine hybrid domain 2. Furin cleavage sites within the N- and C-terminal unique domains were found for all analyzed fibrillin sequences, indicating an essential role for processing of the fibrillin pro-proteins. The four cysteines in the unique N-terminus and the two cysteines in the unique C-terminus are also highly conserved.
Fibrillins constitute a family of large extracellular proteins that form the core of highly ordered extended and ubiquitously distributed aggregates, termed microfibrils
Most of the important structural and functional properties in fibrillins were identified and described with fibrillin nucleotide and protein sequences from mammalian organisms, in particular humans. Thus, the following introductory description of fibrillin properties refers primarily to information available on human fibrillins. The fibrillin family consists of three homologous isoforms, fibrillin-1, -2 and -3. Fibrillins are composed of a typical sequence of individual domains containing between 40–80 amino acid residues (
Numbers above cbEGF domains and within TB and hybrid domains indicate the relative number of the respective domain within the human fibrillin molecule. RGD sites, proprotein processing sites in the unique N- and C-terminal domains (arrows) and predicted N-glycosylation sites (inverted Y symbols) are indicated.
The most prominent domain in fibrillins is the epidermal growth factor-like (EGF) domain which is present 46–47 times in fibrillins. This domain contains six characteristic cysteine residues which form three stabilizing disulfide bonds in a 1–3, 2–4, 5–6 arrangement
Tandem arrays of EGF and cbEGF domains in fibrillins separate two other types of domains, the transforming growth factor (TGF)-β binding protein domains (TB) and the hybrid domains. These two domains are unique to fibrillins and to the latent TGF-β binding proteins (LTBPs)
Fibrillins can be structurally distinguished by a characteristic domain without cysteine residues immediately following the first TB domain. This unique domain is rich in proline residues in fibrillin-1, rich in glycine residues in fibrillin-2 and rich in both proline and glycine residues in fibrillin-3
Recently, a global evolutionary analysis of TB domain-containing proteins highlighted that the characteristic fibrillin domain organization emerged over 600 million years ago prior to the divergence of cnidarians and bilaterians and before LTBPs emerged
In the present study we have retrieved and extensively curated 78 fibrillin gene and protein sequences from 39 organisms ranging from cnidarians to mammals. We have developed evolutionary trees suggesting that a single ancestral fibrillin gene still present in invertebrates and jawless fish (agnathans) underwent duplication. One of the resulting genes evolved to become fibrillin-1, while the other underwent evolutionary changes, including a second duplication, to produce fibrillin-2 and fibrillin-3. We have further analyzed the evolution of critical functional motifs and domains in fibrillins.
Fibrillin nucleotide and protein sequences for 39 organisms were sequentially retrieved from publically available databases including the DOE Joint Genome Institute (
Many of the protein database entries are predicted by automated computation from genomic nucleotide sequences. As a consequence, certain sequences were either incomplete or appeared to contain errors such as insertions and deletions due to misinterpretations of intron/exon boundaries and open reading frames. To compare these critical regions with other sequences, corresponding raw genomic chromosomes, scaffolds or contigs were downloaded from databases and translated into all six reading frames. Upon determining the appropriate direction, the three corresponding reading frames were simultaneously viewed and a known fibrillin sequence was used as a template to highlight and extract individual domains from the raw sequence. With this method, we successfully reconstructed six complete fibrillin sequences, as well as filled in partial or missing regions for various other sequences. For some sequences, precise intron/exon boundaries were difficult to determine and as a consequence, the sequences around certain intron/exon boundaries are uncertain. However, by analyzing these reconstructed sequences with ClustalX, we confirmed the validity of the sequences based on homology. A complete list of manually reconstructed and modified sequences is provided as part of
Given that signal peptides are often species-specific
Phylogenetic trees for fibrillin proteins were initially generated in the following three principal ways. Separate distance-based phylogenetic reconstructions were computed for each of the seven TB domains; sequences with gaps or apparent missense regions in these corresponding domains were not included in the analysis. This allowed us to determine the evolutionary history of each individual TB domain and to include the largest number (59–69) of fibrillin sequences in the analysis. An additional distance-based phylogenetic tree using as input the seven concatenated TB domains was also generated in order to calculate an “average” tree. However, since not all fibrillin sequences are completely available, the total number of fibrillin sequences in this analysis was reduced to 48. A final distance-based phylogenetic tree was generated for the entire sequence using 37 fibrillins; partial and highly-gapped sequences were not included in the analysis. The second TB domain from human LTBP-2 was used as the outgroup for the seven individual fibrillin TB domain analyses. With regards to the tree generated from the seven concatenated TB domains, the same outgroup was used, however it was duplicated seven times to generate a comparable 7-TB-domain structure. Alternatively, the phylogenetic analyses of the seven concatenated fibrillin TB domains were performed with an outgroup consisting of the concatenated individual three TB domains found in human LTBP-2 (not a repeated single TB domain). This method produced very similar results (not shown). The full human LTBP-2 sequence was used as the outgroup for the tree generated from the complete fibrillin sequences. Phylogenetic reconstructions were repeated using the maximum likelihood method for both the concatenated TB domains sequence and the complete fibrillin sequence. Non-fibrillin outgroup sequences were omitted. Final consensus trees were re-rooted around the base of the single fibrillin clade. Similar results were obtained from an additional maximum likelihood phylogenetic reconstruction of the concatenated TB domains using the corresponding 7-TB-domain human LTBP-2 sequence as an outgroup.
The FindPatterns program from the GCG software package was used to search for matching and homologous short sequence motifs, including integrin-binding sites and furin-cleavage sites. The pattern RGD was used to identify RGD-dependent integrin-binding sites. Furin-cleavage sites were located using the pattern RX(K,R)R, interpreted as arginine followed by any amino acid, followed by either lysine or arginine, followed by arginine.
We have retrieved 78 fibrillin sequences from a total of 39 organisms representing several taxonomic groups, including chordates, arthropods, annelids, echinoderms, molluscs and cnidarians from publically available databases (
All fibrillins from all species are characterized by a typical fibrillin domain signature containing seven isolated TB and one or two hybrid domains interspersed by characteristic numbers of cbEGF domains, as originally identified in human fibrillins (
Phylogenetic trees were generated with the entire fibrillin sequences, with the seven concatenated TB domains from each fibrillin and with the individual TB domains. Generally, all generated phylogenetic trees support the existence of an ancestral fibrillin gene, consistently placed outside of the fibrillin-1, -2, and -3 clades, and corresponding with species containing a single fibrillin gene (
The fibrillin protein family is grouped into four separate clades (from bottom to top): single fibrillin (Fbn), fibrillin-1 (Fbn-1), fibrillin-2 (Fbn-2), and fibrillin-3 (Fbn-3). Bootstrap values shown on each node represent the percentage of trees (out of 1,000) yielding the same two-set partition of sequences. The Jones-Taylor-Thornton model with invariant sites and a gamma distribution with four discrete categories was used. The tree was re-rooted around the base of the single fibrillin clade. The scale bar indicates the estimated average number of expected substitutions per site. Analyses of full length fibrillins or of full length fibrillins lacking the unique region resulted in similar phylogenetic trees.
All distance-based and maximum likelihood phylogenetic reconstructions support the concept of two gene duplications as a means of generating the three present-day fibrillin sequences from a single ancestral fibrillin (
TB1 | TB2 | TB3 | TB4 | TB5 | TB6 | TB7 | Concatenated TB domains (DB/ML) | Complete Protein (DB/ML) | |
|
94.3 | 98.0 | 80.5 | 46.4 | 45.4 | 76.5 | 53.2 | 100/100 | 99.9/100 |
|
47.5 | 66.5 | 31.4 | 21.0 | 26.3 | 44.4 | 29.8 | 100/87 | 96.7/100 |
|
59 | 65 | 62 | 67 | 60 | 65 | 69 | 48 | 37 |
Bootstrap values, that is, the percentage of trees yielding the same two-set partition of sequences for a given node, are indicated in percentages (two top rows). Bootstrap values were generated from 1,000 data sets. Distance-based phylogenetic trees were generated with individual TB domains (TB1–TB7). Distance based (DB) and maximum likelihood (ML) based methods were generated with all TB domains of a fibrillin protein concatenated together and with the entire fibrillin sequences as indicated. The numbers of fibrillin sequences included in the respective analyses are indicated in the bottom row.
A characteristic feature that distinguishes amongst the fibrillin isoforms in human and other mammals is the differential presence and location of cell surface integrin-binding sites, characterized by the amino acid sequence RGD (
Only the relevant loop containing the RGD sequence is shown. RGD sequences are highlighted by a box and an arrow. The relative numbers of the cysteine residues within the respective TB domain is indicated on the bottom. Note for TB3 the presence of a RGD integrin-binding site in sea lamprey, ray-finned fish fibrillin-1 and fibrillin-2/3, all fibrillin-2 species and several tetrapod fibrillin-3 sequences. For TB4, the RGD integrin-binding site is present in all vertebrate fibrillins, with the exception of armadillo fibrillin-2. Invertebrate fibrillin sequences do not contain an integrin-binding site in TB4.
With respect to fibrillin-3, only human and chimpanzee fibrillin-3 have RGD sites in cbEGF18, while KGD is found in this domain in the Rhesus macaque, a primate, and NAD, NAE, NAN, or RAD are found in other fibrillin-3 sequences. This finding strongly suggests that the presence of an integrin-binding site in fibrillin-3 within cbEGF18 is a very recent evolutionary change. It is currently not known whether the RGD site in cbEGF18 of human and chimpanzee fibrillin-3 is functional with respect to cell binding.
The integrin-binding site analysis further revealed additional characteristics of the fibrillin family as it evolved over time. Integrin-binding sites were found in TB3 of fibrillin-1 and fibrillin-2/3 sequences, at the homologous fibrillin-2 RGD site, for all surveyed ray-finned fish species (
Fibrillin isoforms in humans and other mammals can also be distinguished by a characteristic unique domain located towards the N-terminus of the protein immediately following the TB1 domain. In humans and other mammals, fibrillin-1 is characterized by a proline-rich region, fibrillin-2 by a glycine-rich region and fibrillin-3 by a mixed proline-glycine-rich region. Here, we collectively use the term “unique region” to generally address this domain in all fibrillin isoforms.
The end of TB1, the unique region, and the EGF4 domain found characteristically in mammalian fibrillins are indicated on the bottom. For invertebrate organisms, the cbEGF-like (purple box), and the new cbEGF domain (orange box) are highlighted, and cysteine residues are circled in red in these domains. The relative numbers of the cysteine residues within the respective EGF domain is indicated on top. Note that the unique region does not exist in invertebrate fibrillin proteins, and is instead replaced by a cbEGF-like domain. The non-calcium binding EGF4 domain that typically follows the unique region is replaced in invertebrate fibrillins by a cbEGF domain.
As expected, the alignment shows clustering of the three vertebrate fibrillin isoforms into proline-rich (fibrillin-1), glycine-rich (fibrillin-2) and proline-glycine-rich (fibrillin-3) domains. Regarding fibrillin-1, the overall structure of the proline-rich region is relatively well-conserved within tetrapod species, while in all ray-finned fish species, this region begins with a 10-residue insert and ends with an additional insert of variable length. In fibrillin-2, the chicken, zebra finch and the lizard all contain a glycine-rich region that varies slightly in comparison to mammalian fibrillin-2 sequences. The proline-glycine-rich region in fibrillin-3 shows a differential structure, whereby the surveyed tetrapod species, including the chicken, opossum, platypus and frog, all contain a unique region that more closely resembles that of fibrillin-2. In mammalian species that contain a functional fibrillin-3 gene, including humans, the structure of this unique region is significantly altered and includes numerous amino acid sequence changes, in addition to the deletion of a 13-residue segment near the C-terminal portion of this domain. Thus, by demonstrating that the tetrapod fibrillin-3 sequences found in amphibians, birds, marsupials and montremes contain unique regions that more closely resemble that of fibrillin-2, with significant evolutionary changes only present in eutherians, we confirm the hypothesis that fibrillin-2 and fibrillin-3 are paralogues.
In an effort to better characterize the difference in residues amongst the three fibrillin isoforms, the unique regions of all collected high quality sequences were examined with respect to the on average most commonly occurring amino acids (
Sequence | Most common amino acid (number of sequences/percentage of unique region) | Second-most common amino acid (number of sequences/percentage of unique region) |
|
Proline (18/40%) | Valine (14/14%) |
|
Glycine (3/46%) | Asparagine (3/16%) |
|
Glycine (15/38%) | Proline (14/12%) |
|
Glycine (4/31%) | Proline (4/20%) |
|
Glycine (7/24%) | Proline (4/18%)Leucine & Proline (3/16%) |
The most common and second-most common amino acids are shown for each of the 3 modern fibrillins, as well as for ray-finned fish fibrillin-2/3 and pre-eutherian fibrillin-3. The use of the ampersand symbol (&) indicates that both amino acids occur at the same frequency.
Surprisingly, the analysis of the region following TB1 revealed the absence of a characteristic unique region from all organisms containing a single fibrillin sequence. It should be noted, however, that we could not determine with confidence the absence of the unique region in hydra, sea urchin, lamprey and amphioxus due to sequence uncertainties in this region. The sequence of amino acids immediately downstream of TB1 does not contain any prevalent amino acid. Domain analysis reveals that this region consists of a linker followed by an EGF domain.
It was speculated that the proline-rich region in human fibrillin-1 fulfills a flexible hinge-region that may be critical for folding of fibrillin-1 into highly ordered microfibrils
The proline-rich region in human fibrillin-1 may be involved in the interaction with tropoelastin
The first hybrid domain in human fibrillin-1 and -2 has been demonstrated to contain a free cysteine located on the surface of the domain and was thus suggested to contribute to potential intermolecular disulfide bonds
All high quality fibrillin sequences contain the first 9-cysteine hybrid domain, confirming previous findings
The hybrid domain 2 of all analyzed organisms including cnidarians, but excluding arthropods, contains 8 cysteines with the typical tandem Cys-Cys in the relative 4–5 position. While the relevant domain in arthropods has considerable homology with typical hybrid domains, it contains 10 cysteines instead of eight as observed previously by Robertson et al. (
Relative numbering of the cysteine residues in the hybrid 2 domain as identified in human fibrillins is indicated on the bottom. In arthropods, this domain has two additional cysteine residues (black boxes indicated by arrows) annotated here as cysteine 4a and 6a.
Cleavage of the tribasic proprotein processing sites located in the unique N- and C-terminal domains of fibrillins have been described, based on experiments with human cells, as a regulatory mechanism of fibrillin-1 assembly into microfibrils
The unique N- and C-terminal fibrillin domains of human and other mammalian organisms contain four and two cysteines, respectively. The status of these cysteines with regards to disulfide-bond patterns is not known. Analyses of evolutionary conservation of those cysteines showed that all six cysteines are completely conserved in all high quality fibrillin sequences, suggesting that they have crucial functions in fibrillins (
Note the high conservation of the six cysteine residues and the surrounding sequences in both domains.
It has been shown that two cysteine residues in the penultimate TB domain of human LTBP-1, 3 and 4 form covalent bonds with the latency associated protein that keeps TGF-β in its inactive state
Using results from the conducted analysis, we studied the evolution of fibrillins throughout key points in evolution. The earliest form of an ancestral fibrillin sequence, still present in cnidarians, molluscs, annelids, arthropods, echinoderms, urochordates, cephalochordates and the vertebrate sea lamprey, already represents the majority of the characteristic domain pattern seen in human fibrillins. Notable differences include the lack of a unique region, replaced by a cbEGF-like domain followed by another cbEGF domain instead of EGF4, the absence of integrin-binding sites until the emergence of cephalochordates and agnathans, as well as the presence of an ancestral hybrid-like domain in place of the second hybrid domain in arthropods. At the time of the first gene duplication, which coincides with the divergence of jawed vertebrates and agnathans, the single fibrillin sequence had already acquired a unique region, the second mature hybrid domain and RGD sites in TB3 and TB4. The coincidence of the first gene duplication and the appearance of the unique regions with the evolutionary appearance of elastin highlight potentially important functions of the unique regions and the availability of more than one fibrillin for the development of elastic fibers and a closed circulatory system. Following the first duplication, one sequence underwent the loss of the TB3 RGD site, as well as several other changes, in order to form fibrillin-1, while the second sequence, coined as fibrillin-2/3, retained the characteristics of the parent sequence and sustained the introduction of a glycine-rich region. The second duplication event coincides with the evolution of tetrapods, thus allowing ray-finned fish to evolve in parallel and develop the present-day form of fibrillin-2/3 found in these species. Following this duplication, one fibrillin copy maintained the features of the parent sequence to become fibrillin-2, while the other, underwent significant evolutionary changes to form fibrillin-3. The transition of the initial tetrapod fibrillin-3 sequence to the fibrillin-3 sequence found in humans and other mammals includes the loss of the RGD site in TB3, the novel introduction of an RGD site in cbEGF18, as well as a reduction of the glycine∶proline ratio in the unique region. Divergence of the fibrillin-3 gene is observed with the evolution of mammals, evident through the loss of the RGD site in TB3 of monotremes and marsupials. Highly conserved sequences throughout the evolution of fibrillins include the pro-protein furin-type processing sites within the unique N- and C-terminal domains, indicating an evolutionarily conserved microfibril assembly mechanism with regards to pro-protein processing. Other notable highly conserved regions in all analyzed fibrillins are the cysteine patterns in the unique N- and C-terminal domains. The results from this study were used to propose key events for the evolution of the three human fibrillins (
An ancestral fibrillin (red) existed prior to the split of bilaterians from non-bilaterians. A gene duplication event (white triangle), presumably at the divergence of gnathostomes (jawed vertebrates) from agnathans (jawless fish), led to the genes for fibrillin-1 (orange) and the ancestral fibrillin-2 and -3 gene, coined as fibrillin-2/3 (purple), and further coincides with the appearance of elastin and the evolution of closed circulatory systems. A second duplication event (black triangle), coinciding with the evolution of tetrapods, led to the genes for fibrillin-2 (blue) and fibrillin-3 (green). The pre-eutherian fibrillin-3 gene underwent significant evolutionary changes to become present-day fibrillin-3. Taxonomic classifications and evolutionary divergence events are found in blue boxes and indicated by fork symbols, with dotted light gray connectors to the individual fibrillins. The upper classification shares a common ancestor with humans. Question marks indicate uncertainties as to whether the unique region appeared before or after the divergence of vertebrates from other chordates. Spacing between taxonomic classifications is not proportional to their evolutionary timeline. Identified conserved characteristic features and evolutionary events found in all analyzed fibrillins are highlighted in an orange-colored box with a black arrow. Evolutionary characteristics and events specific to individual fibrillins are shown in green boxes with solid connectors and circles. Important related evolutionary events are shown in a yellow box.
Additional information about naming fibrillins, zebrafish fibrillins, and sea urchin fibrillin.
(PDF)
Sequences analyzed in this study.
(PDF)
Additional RGD integrin-binding sites in ancestral fibrillin sequences.
(PDF)
We are grateful to Glen Corson for insightful discussions.