Conceived and designed the experiments: AM MKC MLW. Performed the experiments: AM JLS THM. Analyzed the data: AM SAP JLS LKG MKC MLW. Contributed reagents/materials/analysis tools: THM MKC. Wrote the paper: AM SAP JLS LKG MKC MLW.
Current address: Optigen Inc., Ithaca, New York, United States of America
The authors have declared that no competing interests exist.
Scleroderma is a clinically heterogeneous disease with a complex phenotype. The disease is characterized by vascular dysfunction, tissue fibrosis, internal organ dysfunction, and immune dysfunction resulting in autoantibody production.
We analyzed the genome-wide patterns of gene expression with DNA microarrays in skin biopsies from distinct scleroderma subsets including 17 patients with systemic sclerosis (SSc) with diffuse scleroderma (dSSc), 7 patients with SSc with limited scleroderma (lSSc), 3 patients with morphea, and 6 healthy controls. 61 skin biopsies were analyzed in a total of 75 microarray hybridizations. Analysis by hierarchical clustering demonstrates nearly identical patterns of gene expression in 17 out of 22 of the forearm and back skin pairs of SSc patients. Using this property of the gene expression, we selected a set of ‘intrinsic’ genes and analyzed the inherent data-driven groupings. Distinct patterns of gene expression separate patients with dSSc from those with lSSc and both are easily distinguished from normal controls. Our data show three distinct patient groups among the patients with dSSc and two groups among patients with lSSc. Each group can be distinguished by unique gene expression signatures indicative of proliferating cells, immune infiltrates and a fibrotic program. The intrinsic groups are statistically significant (p<0.001) and each has been mapped to clinical covariates of modified Rodnan skin score, interstitial lung disease, gastrointestinal involvement, digital ulcers, Raynaud's phenomenon and disease duration. We report a 177-gene signature that is associated with severity of skin disease in dSSc.
Genome-wide gene expression profiling of skin biopsies demonstrates that the heterogeneity in scleroderma can be measured quantitatively with DNA microarrays. The diversity in gene expression demonstrates multiple distinct gene expression programs in the skin of patients with scleroderma.
Scleroderma is a systemic autoimmune disease with a heterogeneous and complex phenotype that encompasses several distinct subtypes. The disease has an estimated prevalence of 276 cases per million adults in the United States
Scleroderma is divided into distinct clinical subsets. One subset is the localized form, which affects skin only including morphea, linear scleroderma and eosinophilic fasciitis. The other major type is systemic sclerosis (SSc) and its subsets. The most widely recognized classification system for SSc divides patients into two subtypes, diffuse and limited, a distinction made primarily by the degree of skin involvement
Disease classification based largely on the extent of skin involvement does not reflect the true heterogeneity of scleroderma
Skin thickening is one of the earliest manifestations of the disease; it remains the most sensitive and specific finding
DNA microarrays have been used to characterize the changes in gene expression that occur in dSSc skin when compared to normal controls
Previous studies have demonstrated that the skin of patients with dSSc can be easily distinguished from normal controls at the level of gene expression
We studied skin biopsies from 34 subjects: twenty-four patients with SSc (17 dSSc and 7 lSSc), 3 patients with morphea and 6 healthy controls (
Subject | Age/Sex | Duration, yrs | Skin Score (0–51) | Raynaud's severity (0–10) | Digital Ulcers (0–3) | GI | ILD | Renal | ANA/Scl-70/ACA |
dSSc 1 | 41/F | 2 | 28 | - | 0 | + | + | − | +/+/− |
dSSc 2 | 49/M | 2.5 | 26 | 3 | 0 | + | − | − | ND |
dSSc 3 | 33/F | 2.5 | 35 | 7 | 0 | − | − | − | +/+/− |
dSSc 4 | 47/F | 3 | 35 | 7 | 0 | + | − | − | +/−/− |
dSSc 5 | 52/F | 1 | 10 | 4 | 1 | + | − | − | +/+/− |
dSSc 6 | 63/F | 0.5 | 26 | 10 | 0 | − | − | − | +/−/− |
dSSc 7 | 42/F | 2.5 | 23 | 10 | 3 | + | − | − | ND |
dSSc 8 | 58/M | 2 | 43 | 7 | 0 | − | − | − | +/−/− |
dSSc 9 | 56/F | 8 | 21 | 5 | 0 | + | + | − | +/−/− |
dSSc 10 | 35/F | 7 | 35 | 8 | 2 | + | + | − | −/−/− |
dSSc 11 | 47/F | 8.5 | 30 | 8 | 1 | + | + | − | +/+/− |
dSSc 12 | 58/M | 9 | 15 | 5 | 0 | + | − | − | −/−/− |
dSSc 13 | 47/F | 6 | 15 | 3 | 0 | + | − | − | +/−/− |
dSSc 14 | 49/F | 10 | 15 | 8 | 0 | − | + | − | +/−/− |
dSSc 15 | 58/F | 20 | 18 | 2 | 1 | + | + | − | ND |
dSSc 16 | 65/F | 10 | 20 | 4 | 0 | + | + | + | ND |
dSSc 17 | 40/F | 20 | 15 | 2 | 1 | + | + | + | ND |
lSSc 1 | 67/F | 3 | 8 | 5 | 0 | + | − | − | +/−/+ |
lSSc 2 | 57/F | 2 | 8 | 2 | 0 | + | − | − | +/−/+ |
lSSc 3 | 35/F | 3 | 6 | 6 | 3 | + | − | − | +/−/− |
lSSc 4 | 63/F | 13 | 8 | 6 | 0 | − | + | − | +/−/− |
lSSc 5 | 60/F | 28 | 9 | 6 | 0 | + | + | + | +/−/− |
lSSc 6 | 55/F | 17 | 9 | 6 | 1 | + | + | − | +/−/− |
lSSc 7 | 67/F | 5 | 8 | 5 | 0 | + | + | − | +/+/− |
Clinical characteristics of the 25 Systemic Sclerosis subjects from which skin biopsies were taken are shown. Indicated for each subject are the age, sex, disease duration since first onset of non-Raynaud's symptoms, modified Rodnan skin score on a 51-point scale, a self-reported Raynaud's severity score on a 10-point scale, and the presence or absence of digital ulcers on a 3-point scale. Also indicated are the presence (+) or absence (−) of gastrointestinal involvement (GI), interstitial lung disease (ILD) as determined by high-resolution computerized tomography (HRCT), and renal disease. The age and sex of subjects with Morphea are: Morph1 (49 yrs, female, disease duration 16 yrs), Morph2 (54 yrs, female, disease duration 7 yrs), and Morph3 (49 yrs, female, disease duration 4 yrs). The age and sex of healthy control subjects are as follows: Nor1, 53 yrs, female; Nor2, 47 yrs, female; Nor3, 41, female; Nor4, 26, female; Nor5, 45, male; Nor6, 29, female. ND = Not determined
Diagnosis | Patients | Biopsies | Microarrays |
Diffuse SSc | 17 | 30 | 38 |
Limited SSc | 7 | 14 | 16 |
Morphea | 3 | 4 | 5 |
Normal | 6 | 12 | 15 |
Eosinophilic fasciitis | 1 | 1 | 1 |
We identified 4,149 probes whose expression varied from their median values in these samples by more than 2-fold in at least two of the 75 arrays and analyzed them by two-dimensional hierarchical clustering
4,149 probes that changed at least 2-fold from their median value on at least two microarrays were selected from 75 microarray hybridizations representing 61 biopsies. Probes and microarrays were ordered by 2-dimensional average linkage hierarchical clustering. This clustering shows that the dSSc, lSSc, morphea samples form distinct groups largely stratified by their clinical diagnosis. A. The unsupervised hierarchical clustering dendrogram shows the relationship among the samples using this list of 4,149 probes. Samples names have been color-coded by their clinical diagnosis: dSSc in red, lSSc in orange, morphea and EF in black, and healthy controls (Nor) in green. Forearm (FA) and Back (B) are indicated for each sample. Solid arrows indicate the 14 of 22 forearm-back pairs that cluster next to one another; dashed arrows indicate the additional 3 forearm-back pairs that cluster with only a single sample between them. Technical replicates are indicated by the labels (a), (b) or (c). 9 out of 14 technical replicates cluster immediately beside one another. B. Overview of the gene expression profiles for the 4,149 probes. Each probe has been centered on its median expression value across all samples analyzed. Measurements that are above the median are colored red and those below the median are colored green. The intensity of the color is directly proportional to the fold change. Groups of genes on the right hand side indicated with colored bars are shown in greater detail in panels C–H. C. Immunoglobulin genes expressed highly in a subset of patients with dSSc and in patients with morphea, D. proliferation signature, E. collagen and extracelluar matrix components, F. genes typically associated with the presence of T-lymphocyes and macrophages, G. Genes showing low expression in dSSc, H. Heterogeneous expression cluster that is high in lSSc and a subset of dSSc. In each case only a subset of the genes in each cluster are shown. The precise location of each gene in the cluster can be viewed in Supplemental
Patient | Cluster 3.0 | Sig Cluster | Consensus Cluster Assignment | ||
dSSc2 | Diffuse 1 | 1 | [1 or 3] | [1 or 5] | [1 or 5] |
dSSc12 | Diffuse 1 | 1 | 1 | 1 | 1 |
dSSc1 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc10 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc11 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc15 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc16 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc17 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc3 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc4 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc9 | Diffuse 2 | 1 | 1 | 1 | 1 |
dSSc8 | Inflammatory | 2 | 2 | 2 | |
dSSc5 | Inflammatory | 2 | 2 | 2 | 2 |
dSSc6 | Inflammatory | 2 | 2 | 2 | 2 |
lSSc6 | Inflammatory | 2 | 2 | 2 | 2 |
lSSc7 | Inflammatory | 2 | 2 | 2 | 2 |
Morph1 | Inflammatory | 2 | 2 | 2 | 2 |
Morph2 | Inflammatory | 2 | 2 | 2 | 2 |
Morph3 | Inflammatory | 2 | 2 | 2 | 2 |
lSSc1 | Limited | 4 | 4 | 4 | 4 |
lSSc4 | Limited | 4 | 4 | 4 | 4 |
lSSc5 | Limited | 4 | 4 | 4 | 4 |
Nor1 | Limited | 4 | 4 | 4 | 4 |
lSSc2 | Normal-like | 3 | 4 | 4 | 4 |
Nor2 | Normal-like | 3 | 4 | 4 | 4 |
Nor3 | Normal-like | 3 | 4 | 4 | 4 |
dSSc14 | Normal-like | 3 | 3 | 3 | 3 |
dSSc7 | Normal-like | 3 | 3 | 3 | 3 |
lSSc3 | Normal-like | 3 | 3 | 3 | 3 |
Nor4 | Normal-like | 3 | 3 | 3 | 3 |
Nor5 | Normal-like | 3 | 3 | 3 | 3 |
Nor6 | Normal-like | 3 | 3 | 3 | 3 |
dSSc13 | Unclassified | 1 | |||
EF | Unclassified | 1 | 1 | 1 |
Inconsistently classified
Multiple distinct gene expression programs are evident in each subgroup. Some of these recapitulate the major themes in our prior microarray study of dSSc skin
Immunoglobulins typically associated with B lymphocytes and plasma cells are expressed in a subset of the dSSc skin biopsies (
Previous studies have identified infiltrating T cells in the skin of dSSc patients
Genes typically associated with the process of fibrosis were co-expressed with markers of T lymphocytes and macrophages. These genes showed increased expression in the central group of samples that included patients with dSSc, lSSc and morphea (
A surprising result in this study is the differential expression of a ‘proliferation signature’ (
Another cluster of genes is expressed at low levels in the dSSc skin biopsies but at higher levels in all other biopsies, however it is not clearly associated with a single biological function or process. Included in this cluster are the genes WIF1, Tetranectin, IGFBP6, and IGFBP5 found in our original study
Since the skin of lSSc patients does not show any clinical or histologic manifestations at the biopsy site, it was possible that the skin of those patients would not show significant differences in gene expression when compared to normal controls. In fact, lSSc skin showed a distinct, disease-specific gene expression profile. This novel finding demonstrates that microarrays are sensitive enough to identify the limited subset of SSc even when discernable skin fibrosis was not present. There is a signature of genes that is expressed at high levels in a subset of lSSc patients, and variably expressed in dSSc and normal controls (
We previously demonstrated that skin biopsies from patients with early dSSc show nearly identical patterns of gene expression at a clinically affected forearm site and a clinically unaffected back site, and the gene expression profiles are distinct from those found in healthy controls
A list of genes selected by their fold change alone is not ideal for classifying samples because they emphasize differences between samples rather than the intrinsic differences between patients
The 995 most ‘intrinsic’ genes selected from 75 microarray hybridizations analyzing 34 individuals. Two major branches of the dendrogram tree are evident which divide a subset of the dSSc samples from all other samples. Within these major groups are smaller branches with identifiable biological themes, which have been colored accordingly: blue for
The gene expression signatures further subdivide samples within existing clinical groups. We find a consistent set of genes that are highly expressed in a subset of the dSSc samples, which occupy the left branch of the dendrogram tree
To examine the robustness of these groups, we performed two separate analyses: Statistical Significance of Clustering (SigClust)
To perform a second validation of the intrinsic groups, we used consensus clustering
The robustness of the sample classifications was analyzed by consensus clustering, which uses multiple iterations of K-means clustering with random restart. 500 subsets of the data were sampled without replacement. The results of consensus clustering and Principal Component Analysis (PCA) applied to the 75 arrays and 995 intrinsic genes are shown. A. Consensus matrices are shown for K = 4, 5 and 6. Cluster numbers are shown and cluster assignments are summarized in
Based on this analysis and the SigClust analysis, we propose that there are approximately four to five statistically significant clusters in the data. The statistically significant cluster assignments from both SigClust and consensus clustering are summarized in
To determine how sensitive the clustering was to the selection of the intrinsic genes, we analyzed the clustering results using a larger list of 2071 intrinsic genes and compared that clustering to that obtained with 995 intrinsic genes (
Principal Component Analysis (PCA) was used to confirm the sample grouping found by hierarchical clustering. PCA is an analytic technique used to reduce high dimensional data into more easily interpretable principal components by determining the direction of maximum variation in the data
In order to systematically investigate the biological processes found in the gene expression profiles of SSc, we created a module map using Genomica software
A. Module map of the Gene Ontology (GO) Biological Processes differentially expressed among the scleroderma samples is shown. Each column represents a single microarray and each row represents a single GO Biological process. Patient samples are organized as described in
Modules with significantly enriched genes (p<0.05, hypergeometric distribution) and corrected for multiple hypothesis testing with an FDR of 0.1% are shown (
Expressed in the
In order to better define the proliferation signature observed, we created gene sets representing the genes periodically expressed in the human cell division cycle as defined by Whitfield et al.
To better characterize the lymphocyte infiltrates we generated gene sets representing lymphocyte subsets using results reported by Palmer and coworkers
In order to verify that the gene expression reflected increased numbers of infiltrating lymphocytes or proliferating cells, we performed IHC for T cells (anti-CD3), B cells (anti-CD20) and cycling cells (anti-KI67). Summarized in
Patient | Assignment a | KI67 Append | KI67 Epiderm | KI67 Derm | CD3 Append | CD3 Epiderm | CD3 Derm |
Nor2 | Normal-like | 10 | 11 | 0 | 14 | 0 | 3 |
Nor3 | Normal-like | 0 | 11 | 0 | 22 | 0 | 0 |
Morph3 | Inflammatory | 1 | 13 | 0 | 205 | 18 | 107 |
Morph1 | Inflammatory | 0 | 21 | 0 | 36 | 5 | 14 |
dSSc5 | Inflammatory | 4 | 11 | 0 | 68 | 1 | 5 |
dSSc6 | Inflammatory | 7 | 0 | 0 | 83 | 2 | 15 |
dSSc1 | Prolif (2) | 4 | 20 | 0 | 56 | 0 | 0 |
dSSc11 | Prolif (2) | 8 | 14 | 0 | 12 | 0 | 7 |
dSSc2 | Prolif (1) | 0 | 22 | 1 | 31 | 0 | 2 |
dSSc12 | Prolif (1) | 2 | 85 | 0 | 55 | 10 | 16 |
Shown is the summary of total counts per skin biopsy as determined by IHC staining for KI67, which stains cycling cells, and CD3, which stains T cells. Each biopsy was also analyzed for CD20 and only a small number of cells were found around dermal appendages for Morph3 (3), dSSc6 (2) and dSSc12 (2). All other samples were negative for CD20 cells. (Append = dermal appendages (hair follicles, vascular structures, eccrine glands); Epiderm = epidermis; Derm = dermis). a. Intrinsic group to which each sample was assigned. b. Average of total counts per category.
Few CD20+ B cells were observed in the SSc skin biopsies. The immunoglobulin gene expression signature was observed in eight diffuse patients (dSSc1, dSSc3, dSSc6, dSSc7, dSSc8, dSSc10, dSSc11, dSSc12) and one limited patient (1SSc7;
The presence of the proliferation signature is correlated with an increase in the mitotic index or number of dividing cells in microarray studies of cancer
To map the intrinsic groups to specific clinical covariates, Pearson correlations were calculated between the gene expression of each of the 995 intrinsic genes and different clinical covariates. Shown are the results for three different covariates: the modified Rodnan skin score (MRSS; 0–51 scale), a self-reported Raynaud's severity score (0–10 scale), and the extent of skin involvement (dSSc, lSSc and unaffected). Each group was analyzed for correlation to each of the clinical parameters listed in
A. Shown is the color-coded heatmap of the 75 arrays and 995 intrinsic genes. The graph on the right of the heat map shows disease duration for each sample. Disease duration was set to zero for normal controls and morphea samples. B. Pearson correlations were calculated between skin score and the expression values for each gene in the list. The moving average of the Pearson correlation (10-gene window) was plotted. Regions of high negative and high positive correlations to the three different clinical parameters are indicated (regions I–III shaded grey). C. Moving average of the Pearson correlation coefficients (10-gene window) between the self-reported Raynaud's severity score and the expression of each gene, D. Moving average of the Pearson Correlations (10-gene window) between extent of skin involvement and a diagnosis vector (see Methods) for dSSc(red), lSSc (orange) and healthy controls (green). E. Box plot of disease duration for dSSc patients. The patients included in the
Areas of high positive or high negative correlation are highlighted in three different panels (
One initial hypothesis was that there would be an obvious trend in the gene expression data reflecting the progressive nature of SSc in some patients. To examine this more carefully, disease duration in years since first onset of non-Raynaud's symptoms is plotted along the X-axis of the heat map (
Since no obvious clinical covariate was identified that differentiated the dSSc group 1 from dSSc group 2, we selected the genes that most differentiated the two groups. Genes were selected that differentiated group 1 from group 2 using a non-parametric t-test implemented in Significance Analysis of Microarrays (SAM)
To identify genes associated with MRSS we selected the subset of genes most highly correlated with each covariate from the intrinsic list using Pearson correlations. 177 genes were selected from the 995 intrinsic genes that had Pearson correlations with MRSS >0.5 or <−0.5. We then used this list of 177 genes to organize the skin biopsies by average linkage hierarchical clustering (
We selected the genes from the 995 intrinsic list that had a correlation greater than 0.5 or less than −0.5 to the MRSS. This list of 177 genes was then used to organize the skin biopsies. Forearm-back pairs from 14 patients with dSSc (mean MRSS of 26.34±9.42) clustered onto one branch of the dendrogram tree. The forearm-back pairs of 4 patients with dSSc (Mean MRSS 18.11±6.45) clustered onto a different branch of the dendrogram tree. The difference in skin score between these two groups is statistically significant (p<0.0197).
From this analysis, 62 genes were expressed high levels and 115 genes were expressed at low levels in the patients with the highest skin score. Genes highly expressed include the cell cycle genes CENPE, CDC7 and CDT1, the mitogen Fibroblasts Growth Factor 5 (FGF5), the immediate early gene Tumor Necrosis Factor Receptor Superfamily member 12A (TNFRSF12A) and TRAF interacting protein (TRIP). Since skin score is considered to be an effective measure for disease outcome, this 177-gene group may contain genes that could be further developed into surrogate markers for skin score.
In order to validate the gene expression in the major groups found in this study, we performed quantitative real time PCR (qRT-PCR) on three genes selected from the intrinsic subsets (
The mRNA levels of three genes, TNFRSF12A (A), CD8A (B) and WIF1 (C) were analyzed by Taqman quantitative real time PCR. Each was analyzed in two representative forearm skin biopsies from each of the major subsets of proliferation, inflammatory, limited and normal controls. In the case of TNFRSF12A, patient dSSc11 was replaced by patient dSSc10, which cluster next to one another in the intrinsic subsets and show similar clinical characteristics (
Each gene is shown with the fold change relative to the median value for the eight samples analyzed. TNFRSF12A shows highest expression in the patients with dSSc and the lowest in patients with limited SSc and normal controls. The three patients with highest expression are dSSc and include the proliferation group (
We have used DNA microarrays to determine if the heterogeneity in scleroderma can be captured quantitatively and objectively using gene expression profiling. We used an experimental design that has previously been used with great success to identify molecular subsets in tumors
Our results show that the diversity in the gene expression patterns of SSc is much greater than demonstrated in two prior studies of dSSc skin
It is unlikely that the underlying gene expression groups result from technical artifacts or heterogeneity at the site of biopsy. First, we created a standardized sample-processing pipeline, which was extensively tested on skin collected from surgical discards prior to beginning this study and included strict protocols that were used throughout with the goal of eliminating variability in sample handling and preparation. Second, all gene expression groups were analyzed for correlation to date of hybridization, date of sample collection and other technical variables that might have affected the groupings. Also, heterogeneity at the site of biopsy is unlikely to account for the findings as the signatures used to classify the samples were selected by virtue of their being expressed in both the forearm and back samples of each patient. The inflammatory group is unlikely to be a result of active infection in patients as individuals with active infections were excluded from the study. Finally, the gene expression signatures we found are supported by both the IHC findings (
We were able to associate our gene expression signatures with changes in specific cell markers. We have confirmed infiltration of T cells in the dermis of the ‘inflammatory’ subgroup, and have confirmed an increase in the number of proliferating cells in the epidermis in the ‘proliferation’ group. The increase in the number of proliferating cells in the epidermis could result from paracrine influences on the resident keratinocytes, possibly activated by the profibrotic cytokine TGFβ. We were not able to find significant numbers of CD20 positive B-cells.
An open question that remains is how do these gene expression changes correlate with more specific histological changes in the skin? Two studies of gene expression in liver
The detection of subsets in the gene expression of SSc raises questions as to their etiology. Do these subsets represent distinct groups with stable patterns of gene expression or do the groups represent different time-dependent phases of the disease? We have found a clear relationship between severity of disease and gene expression (
The multiple groups observed in our gene expression data may correspond to patients that will have distinct clinical outcomes. This is supported by recent work analyzing the relationship between change in skin score and outcome in a large single center cohort of 225 patients
This study allows us to then propose two different models that could account for the gene expression subsets we have found in scleroderma. The first model is that there are multiple distinct groups of scleroderma patients, each exhibiting distinct gene expression profiles. The aberrant gene expression patterns may be established early in the disease and remain stable during disease progression. In this case, serial biopsies taken over time would result in sequential biopsies from the same patient always remaining in the same group. It would likely be possible to identify the clinical endpoints and complications to which each group would progress. The implications are that it may be possible to predict patient outcome based on their gene expression profile. The reports of three different groups of diffuse patients with different outcome trajectories or different skin thickness progress rates supports this model
The second model is that the different gene expression subgroups represent different disease stages. This is supported in part by the analysis of disease duration since the first onset of non-Raynaud's symptoms between the group we labeled
The gene expression profiles in scleroderma hold the promise of identifying markers of disease activity that could be used as surrogate markers in clinical trials. Therefore, the analysis of skin biopsies before and after treatment may be useful in testing the efficacy of novel therapeutics. To this end, we have identified 177 genes that are strongly correlated with the severity of skin disease. These genes may point to a novel pathway involved in skin fibrosis that includes TNFRSF12A (Tweak Receptor (TweakR); Fn14), which is a TNF receptor family member expressed on both fibroblasts
Ethics approval was obtained for this study from the University of California at San Francisco's Committee on Human Research (CHR) and from Dartmouth College's Committee for the Protection of Human Subjects (CPHS). All subjects signed consent forms approved by the CHR at the University of California, San Francisco (UCSF). All patients met the American College of Rheumatology classification criteria for SSc
Skin biopsies were taken from a total of 34 individuals: 17 patients with dSSc, 7 patients with lSSc, 3 patients with morphea (MORPH), 6 healthy volunteers (NORM) and one patient with eosinophilic fasciitis (EF) (
In most cases, two 5-mm punch biopsies were taken from the lateral forearm, 8 cm proximal to the ulna styloid on the exterior surface non-dominant forearm for clinically involved skin. Two 5-mm punch biopsies were also taken from the lower back (flank or buttock) for clinically uninvolved skin. Thirteen dSSc patients provided forearm and back biopsies; four dSSc patients provided only single forearm biopsies. The seven lSSc patients and all six healthy controls also underwent two 5-mm punch biopsies at the identical forearm and back sites. Three subjects with morphea underwent two 5-mm punch biopsies at the clinically affected areas of the leg (MORPH1), abdomen (MORPH2), and back (MORPH3).
For each patient, one biopsy was immediately stored in 1.5. mL RNAlater (Ambion) and frozen at −80°C, a second biopsy was bisected; half went into 10% formalin for routine histology and half was fresh frozen. In total, 61 biopsies were collected for microarray hybridization: 30 from dSSc, 14 from lSSc, 4 from morphea, 1 eosinophilic fasciitis, and 12 from healthy controls (
RNA was prepared from each biopsy by mechanical disruption with a PowerGen125 tissue homogenizer (Fisher Scientific) followed by isolation of total RNA using an RNeasy Kit for Fibrous Tissue (Qiagen). Approximately 2–5 µg of total RNA was obtained from each biopsy.
200 ng of total RNA from each biopsy was converted to Cy3-CTP (Perkin Elmer) labeled cRNA, and Universal Human Reference (UHR) RNA (Stratagene) was converted to Cy5-CTP (Perkin Elmer) labeled cRNA using a low input linear amplification kit (Agilent Technologies). Labeled cRNA targets were then purified using RNeasy columns (Qiagen). Cy3-labeled cRNA from each skin biopsy was competitively hybridized against Cy5-CTP labeled cRNA from Universal Human Reference (UHR) RNA pool, to 44,000 element DNA oligonucleotide microarrays (Agilent Technologies) representing more than 33,000 known and novel human genes in a common reference design
After hybridization, arrays were washed following Agilent 60-mer oligo microarray processing protocols (6× SSC, 0.005% Triton X-102 for 10 min. at room temperature; 0.1× SSC, 0, 005% Triton X-102 for 5 min at 4°C, rinse in 0.1× SSC). Microarray hybridizations were performed for each RNA sample resulting in 61 hybridizations. Fourteen replicate hybridizations were added, resulting in a total of 75 microarray hybridizations.
Microarrays were scanned using a dual laser GenePix 4000B scanner (Axon Instruments). The pixel intensities of the acquired images were then quantified using GenePix Pro 5.0 software. Arrays were visually inspected for defects or technical artifacts, and poor quality spots were manually flagged and excluded from further analysis. Only spots with fluorescent signal at least twofold greater than local background in both Cy3- and Cy5- channels were included in the analysis. Probes missing more than 20% of their data points were excluded, resulting in 28,495 probes that passed the filtering criteria. The data were displayed as log2 of the LOWESS-normalized Cy5/Cy3 ratio. Since a common reference experimental design was used, each probe was centered on its median value across all arrays.
An intrinsic gene identifier algorithm was used to select a set of intrinsic scleroderma genes. Detailed methods on the selection of intrinsic genes are available in
In order to estimate False Discovery Rate (FDR) at a given intrinsic weight, the analysis was repeated on data randomized in rows (i.e. across each gene). The FDR at a given weight was estimated by determining the number of genes that received the same weight or lower in the randomized data. 995 genes were selected that had an intrinsic weight <0.3; in randomized data 39±7 genes (calculated from 10 independent randomizations) had a weight of 0.3 or less, resulting in an FDR of approximately 4%. We found that a cutoff of 0.3 balanced the number of genes selected with an acceptable FDR, while retaining reproducible hierarchical clustering of technical replicate samples. Although it is possible to select a more or less restrictive list of genes with FDRs of 5% (weight <0.35; 2071 genes), 3.4% (weight <0.25; 425 genes) or 2.4% (weight <0.20; 171 genes), these smaller lists of genes resulted in less reproducible hierarchical clustering suggesting overfitting (
Average linkage hierarchical clustering was performed in both the gene and experiment dimensions using either Cluster 3.0 software (
The statistical significance of clustering was assessed using Statistical Significance of Clustering (SigClust)
In addition, we analyzed the 995 intrinsic genes using Consensus Cluster
Principal Component Analysis was performed using Multiexperiment Viewer (MeV) software version 4.0.01 (
Module maps were created using the Genomica software package
All 75 microarray experiments and 28,495 DNA probes were included in the module map analysis. The 28,495 probes were collapsed to 14,448 unique LocusLink Ids (LLIDs)
Pearson correlations were calculated between each clinical parameter and the gene expression data in Microsoft Excel. Pearson correlations between the diagnosis of dSSc, lSSc and healthy controls and the gene expression data were calculated by creating a ‘diagnosis vector’. The diagnosis vector was created by assigning a value 1.0 to all dSSc samples and 0.0 to all remaining samples for the dSSc vector; lSSc and healthy controls were treated similarly creating a vector for each. Pearson correlations were calculated between the gene expression vector and the diagnosis vector for dSSc, lSSc and healthy controls. Correlations between the gene expression and clinical data were plotted as a moving average of a 10-gene window.
IHC was performed on paraffin embedded sections at the University of California, San Francisco in the Immunohistochemistry and Molecular Pathology core facility. All immunostaining was completed via a semi-automated protocol utilizing an automated immunostainer (DAKO Corp, Carpenteria, CA). Slides were heated, deparaffinized and then hydrated. Protease digestion was completed followed by antigen retrieval via pressure cooker as per standard protocols. After an endogenous peroxidase block with 3% H202, slides were loaded on to the automated immunostainer. A primary antibody cycle of 30 min was followed by a secondary antibody cycle using the ENVISION+ system. Color development was completed using DAB followed by counterstaining with Gills #2 Hematoxylin. Specific conditions for the antibodies utilized were as follows: anti-CD20 (DAKO) was used at 1∶600 for 30 minutes in citrate buffer (pH 6.0); anti-CD3 (DAKO) at 1∶400 for 30 minutes in Tris buffer (pH 9.0), and anti-Ki67 (MiB1; DAKO) was used at 1∶1000 for 30 minutes in Tris buffer (pH 9.0). Marker positive cells were enumerated by tissue compartment in equal sized images of
Each quantitative real time PCR assay
The full dataset, figures in both red/green and blue/yellow format, as well as searchable versions of
Gene expression signatures in scleroderma. 4,149 probes that changed at least 2-fold from their median value on at least two microarrays were selected from 75 microarray hybridizations representing 61 biopsies. Probes and microarrays were ordered by 2-dimensional average linkage hierarchical clustering. This clustering shows that the dSSc, lSSc, morphea samples form distinct groups largely stratified by their clinical diagnosis. A. The unsupervised hierarchical clustering dendrogram shows the relationship among the samples using this list of 4,149 probes. Samples names have been color-coded by their clinical diagnosis: dSSc in red, lSSc in orange, morphea and EF in black, and healthy controls (Nor) in green. Forearm (FA) and Back (B) are indicated for each sample. Solid arrows indicate the 14 of 22 forearm-back pairs that cluster next to one another; dashed arrows indicate the additional 3 forearm-back pairs that cluster with only a single sample between them. Technical replicates are indicated by the labels (a), (b) or (c). 9 out of 14 technical replicates cluster immediately beside one another. B. Overview of the gene expression profiles for the 4,149 probes. Each probe has been centered on its median expression value across all samples analyzed. Measurements that are above the median are colored red and those below the median are colored green. The intensity of the color is directly proportional to the fold change. Groups of genes on the right hand side indicated with colored bars are shown in greater detail in panels C – H. C. Immunoglobulin genes expressed highly in a subset of patients with dSSc and in patients with morphea, D. proliferation signature, E. collagen and extracelluar matrix components, F. genes typically associated with the presence of T-lymphocyes and macrophages, G. Genes showing low expression in dSSc, H. Heterogeneous expression cluster that is high in lSSc and a subset of dSSc. This figure shows all gene names associated with the panels in
(3.00 MB PDF)
Cluster analysis using the scleroderma intrinsic gene set. The 995 most ‘intrinsic’ genes selected from 75 microarray hybridizations analyzing 34 individuals. Two major branches of the dendrogram tree are evident which divide a subset of the dSSc samples from all other samples. Within these major groups are smaller branches with identifiable biological themes, which have been colored accordingly: blue for diffuse 1, red for diffuse 2, purple for inflammatory, orange for limited and green for normal-like. Statistically significant clusters (p<0.001) identified by SigClust are indicated by an asterisk (*) at the lowest significant branch. A. Experimental sample hierarchical clustering dendrogram. Black bars indicate forearm-back pairs which cluster together based on this analysis. B. Scaled down overview of the intrinsic gene expression signatures. C. Limited SSc gene expression -cluster. D. Proliferation cluster. E. Immunoglobulin gene expression cluster. F. T-lymphocyte and IFNγ gene expression cluster. This file shows all gene names associated with the panels in
(6.69 MB PDF)
Robustness of intrinsic clustering. Hierarchical clustering was performed with two different sets of intrinsic genes. A. 995 intrinsic genes (weight <0.3; 4% FDR), B. 2071 intrinsic genes (weight <0.35, 5% FDR). Statistically significant clusters (p<0.05) as determined by SigClust are indicated by an asterisk (*). Transparent bars indicate the movement of groups of samples. The major clusters are recapitulated with this larger set of genes.
(1.07 MB TIF)
Scleroderma Module Map. Module map of the Gene Ontology (GO) Biological Processes differentially expressed among the scleroderma samples is shown. Each column represents a single microarray and each row represents a single GO Biological process. Patient samples are organized as described in
(2.41 MB TIF)
Immunohistochemistry for lymphocyte subsets and proliferating cells in scleroderma skin. Lymphocyte subsets in forearm biopsies of six dSSc patients, the leg and back specimens of two morphea patient and forearm samples of two healthy control were analyzed by immunohistochemistry. Paraffin sections were stained for T cells (CD3), B cells (CD20) and proliferating cells (KI67). (Magnification: ×200). See
(9.60 MB TIF)
4,149 probes shown in
(2.44 MB TXT)
995 intrinsic genes shown in
(0.59 MB TXT)
Supporting data file for
(0.09 MB TDS)
Supporting data file for
(0.10 MB TXT)
We thank Joel Parker, Victor Weigman and Charles Perou (University of North Carolina, Chapel Hill) for aid in the SigClust and Consensus Cluster analysis, J. Stephen Marron and Yufeng Liu (University of North Carolina, Chapel Hill) for access to the SigClust software prior to publication, Howard Y. Chang (Stanford University School of Medicine) for helpful discussions and assistance with Genomica software, Max Diehn (Stanford University School of Medicine) for providing the Intrinsic Gene Identifier Algorithm, and Todd Bersaglieri for construction of the web supplement. We would also like to thank the reviewers whose suggestions greatly improved the manuscript.