Conceived and designed the experiments: MFF. Performed the experiments: VC AH AH BS AFF EL SC CB RR LB ST NJH. Analyzed the data: VC AH AFF MA. Contributed reagents/materials/analysis tools: AH AH BS PM JGC PL LB NJH HM OB CLL PA BS. Wrote the paper: PA BS ME MFF.
The authors have declared that no competing interests exist.
Developmental genes are silenced in embryonic stem cells by a bivalent histone-based chromatin mark. It has been proposed that this mark also confers a predisposition to aberrant DNA promoter hypermethylation of tumor suppressor genes (TSGs) in cancer. We report here that silencing of a significant proportion of these TSGs in human embryonic and adult stem cells is associated with promoter DNA hypermethylation. Our results indicate a role for DNA methylation in the control of gene expression in human stem cells and suggest that, for genes repressed by promoter hypermethylation in stem cells
In the course of embryonic development, cells are initially totipotent but, after a few divisions, begin to lose potency and are transformed into pluripotent cells, finally becoming terminally differentiated somatic cells. The progressive loss of potency during differentiation has fundamental implications for disease because recovery of pluripotency through nuclear reprogramming is one of the major challenges in regenerative medicine
Differentiation of human embryonic stem cells (hESCs) requires the repression of transcription factors involved in maintaining pluripotency and the activation of developmental genes. Both processes are directed by specific epigenetic mechanisms. An example of the first type is the promoter hypermethylation-dependent repression of pluripotency-maintaining genes such as NANOG and OCT4 as stem cells differentiate
We used Illumina Goldengate Methylation Arrays© to compare the DNA methylation status of 1,505 sequences (from 807 genes) in eight independently isolated hESCs lines, 21 normal human primary tissues (NPTs) corresponding to six normal tissue types (NTTs) and 21 human cancer cell lines (CCLs) (see
(A) Methylation profiles of Class A-I (350), A-II (94), B-I (20), and B-II (107) genes in hESCs (8), normal (21), and cancer (21) samples obtained by Illumina arrays. The methylation levels vary from fully methylated (red) to fully unmethylated (white). The right-hand columns show the methylation status of histone H3 and Polycomb occupancy of the same genes obtained from previously published data
Methylation in hESCs | Methylation in CCLs | Methylation in NTTs | Proposed biological role | Name of the category | Group of genes from Supplementary |
|
1421 Sequences | Hypermethylated in hESCs>0.7 in≥2/8 samples | Hypermethylated in hCCLs>0.7 in≥6/21 samples | 159 Sequences (11.19%) hypermethylated in all NTTs (>0.7 in 6/6 samples) | Genes constitutively hypermethylated | - | G1 |
20 Sequences (1.41%) unmethylated in all NTTs (<0.3 in 6/6 samples) | Genes that become demethylated early during hESC differentiation or that become aberrantly hypermethylated during |
Class B-I | G2 | |||
393 Sequences (27.66%) | 107 Sequences (7.53%) sometimes unmethylated (≤0.3 signal in ≥1/6 and ≤5/6 samples) | Genes which demethylation during hESC differentiation might be important for lineage specification. Their hypermethylation might provide advantages to the cancer cells. | Class B-II | G3 | ||
493 Sequences (34.69%) | Not hypermethylated in hCCLs | 16 Sequences (1.13%) hypermethylated in all NTTs (>0.7 in 6/6 samples) | Genes that become frequently demethylated in cancer | - | G4 | |
Not>0.7 in≥6/21 samples | 11 Sequences (0.77%) unmethylated in all NTTs (<0.3 in 6/6 samples ) | Genes that become demethylated early during hESC differentiation. Their hypermethylation might not provide advantages to the cancer cells. | - | G5 | ||
100 Sequences (7.04%) | 39 Sequences (5.07%) sometimes unmethylated (≤0.3 signal in ≥1/6 and ≤5/6 samples) | Genes which demethylation during hESC differentiation might be important for lineage specification. Their hypermethylation might not provide advantages to the cancer cells. | - | G6 | ||
Not hypermethylated in hESCs not>0.7 in≥2/8 samples | Hypermethylated in hCCLs>0.7 in≥6/21 samples | 1 Sequences (0.07%) hypermethylated in all NTTs (>0.7 in 6/6 samples) | Genes hypermethylated early during hESC differentiation. Their hypermethylation should not provide advantages to the cancer cells. | - | G7 | |
350 Sequences (24.63%) unmethylated in all NTTs (<0.3 in 6/6 samples) | Genes constitutively unmethylated during normal development. Their aberrant hypermethylation provide advantages to the cancer cells. | Class A-I | G8 | |||
464 Sequences (32.65%) | 94 Sequences (6.61%) sometimes unmethylated (≤0.3 signal in ≥1/6 and ≤5/6 samples) | Genes which hypermethylation during hESC differentiation might be important for lineage specification. Their aberrant hypermethylation provide advantages to the cancer cells. | Class A-II | G9 | ||
928 Sequences (65.31%) | Not hypermethylated in hCCLs | 1 Sequences (0.07%) hypermethylated in all NTTs (>0.7 in 6/6 samples) | Genes hypermethylated early during hESC differentiation. These genes could be aberrantly hypomethylated in cancer. | - | G10 | |
Not>0.7 in≥6/21 samples | 404 Sequences (28.43%) unmethylated in all NTTs (<0.3 in 6/6 samples) | Genes constitutively hypomethylated | - | G11 | ||
464 Sequences (32.65%) | 52 Sequences (3.66%) sometimes unmethylated (≤0.3 signal in ≥1/6 and ≤5/6 samples) | Genes which hypermethylation during hESC differentiation might be important for lineage specification. These genes could be aberrantly hypomethylated in cancer. | - | G12 |
The classification criteria are described in the
Significantly, we found that 34.69% (493/1,421) of the sequences were frequently hypermethylated in hESCs (array signal>0.7 in≥25% (2/8) of the samples). Most of these (79.72%, 393/493) were also frequently hypermethylated in CCLs (array signal>0.7 in≥25% (6/21)) of the samples). Again, many of them (32.32%, 127/393) were unmethylated in at least one of the NTTs analyzed (array signal<0.3 in≥17% (1/6) NTTs) (
Intriguingly, not all the genes frequently hypermethylated in CCLs were completely unmethylated in all the NTTs analyzed (
It is important to note here that all the previously described percentages refer to the 807 genes included in methylation arrays, whereas the overall percentage of genes in each group might be different if the entire genome were considered. The classification threshold that we employed to identify genes frequently hypermethylated in hESCs (more than 70% of promoter CpG methylation in more than 25% of samples analyzed) is that which is commonly used to define a gene as being frequently hypermethylated in cancer
It has recently been shown that prolonged
As previously stated, it has recently been proposed that developmental genes are silenced in embryonic stem cells by a Polycomb-dependent bivalent histone-based chromatin mark
When we compared the chromatin patterns and Polycomb occupancy in the Class A-I, A-II, B-I, and B-II genes we found each group to have a specific chromatin signature (p<0.00001). Class A genes were more enriched in Polycomb and bivalent marks (47.5% and 45.5–57.3% of genes, respectively) than Class B genes (19.7% and 21.4–32.7%, respectively) (p<0.00001) (
To test the hypotheses formulated on the basis of the data obtained from the methylation arrays, we focused our attention on four Class B genes (frequently hypermethylated in cancer and hESCs) that were previously widely reported to be genes with tumor suppressor properties and that are frequently hypermethylated in cancer. We selected two (MGMT and SLC5A8)
(A) Bisulfite genomic sequencing of multiple clones of the MGMT promoter in hESCs (I3, H14), normal primary tissues (Pool lymphocytes, normal breast) and two CCLs of lymphoid and breast origin (U937 and MDA-MB-231, respectively). Black, methylated CpG; white, unmethylated CpG; red, CpG not present. The green bar above the diagram of the MGMT CpG island indicates the location of the probe used in the methylation arrays. (B) Relationship between MGMT promoter hypermethylation and expression in hESC, normal, and cancer samples. The upper panel shows the relative methylation signal obtained with the methylation arrays and the lower panel the expression levels of MGMT mRNA relative to GAPDH.
To demonstrate further that the differentiation of hESCs is associated with less DNA methylation at the promoter region of certain genes, we induced the
(A) Left-hand images, Shef-1 stem cell line (upper) and the same cells after neural differentiation (middle) and spontaneous differentiation to fibroblast-like cells (lower). The right-hand panels show the relative mRNA levels of pluripotency (NANOG, OCT4), neuroectodermal (PAX6, NEUROD1), and mesodermal (COL1A1, FN1) markers before and after Shef-1 differentiation. (B) Number of sequences hypomethylated during Shef-1 neural (red circle) and spontaneous (blue circle) differentiation, and their overlap with Class B-I and Class B-II genes (black circles). (C) Bisulfite genomic sequencing of multiple clones of the DLC1 promoter in Shef-1 stem cell line (upper) and the same cells after neural differentiation (middle) and spontaneous differentiation to fibroblast-like cells (lower). The color code is as for
To demonstrate that some TSGs that are frequently hypermethylated in cancer and hESCs can lose methylation during differentiation, we focused our attention on DLC1. We chose this gene because the methylation arrays had shown that it lost promoter methylation during spontaneous differentiation of Shef-1, and because it is known to be a TSG that is frequently hypermethylated in cancer (
Having demonstrated that some cancer genes are hypermethylated and repressed in hESCs and that they can lose methylation during
(A) The left-hand panel shows the numbers of sequences that are hypermethylated in the somatic stem cells CD34+, and hypermethylated in hESCs and CCLs. Note that most of the sequences hypermethylated in somatic stem cells are also hypermethylated in embryonic stem cells. The right-hand panel shows the number of sequences hypermethylated in CD34+ cells (black circle) classified as Class B-II genes (red circle). Sequences hypermethylated in CD34+ cells were never classified as Class B-I genes (blue circle). (B) Bisulfite genomic sequencing of multiple clones of the AIM2 promoter in Shef-1 and I3 stem cell lines (upper), CD34+ hematopoietic stem cell progenitors (middle), and terminally differentiated hematopoietic cells (peripheral lymphocytes and neutrophils). The color code is as for
Finally, to demonstrate that some cancer methylated genes are also frequently methylated in somatic progenitor stem cells and that their methylation is important for lineage specification, we considered two genes:
Aberrant promoter hypermethylation of TSGs and differentiation factors is a central epigenetic alteration in cancer
On the basis of the methylation status in hESCs we established two categories of cancer methylated genes: Class A genes, which are frequently unmethylated in hESCs, and Class B genes, which are frequently hypermethylated in hESCs. As we unexpectedly found that a substantial proportion of the genes included in both groups were also frequently hypermethylated in normal differentiated tissues, we established two new subcategories of cancer methylated genes: subcategory I, for genes that are mostly unmethylated in normal tissues, and subcategory II, for genes that are sometimes hypermethylated in normal tissues. The biological interpretation of aberrant methylation within Classes A and B cancer methylated genes and their two subcategories is completely different. Class A-I genes are frequently hypermethylated in cancer but not in normal tissues or hESCs. These genes are not supposed to be regulated by DNA methylation during normal development and thus the hypermethylation in cancer should always be interpreted as an aberrant process. Class A-II genes are frequently methylated in CCLs and sometimes in normal tissues, but rarely in hESCs. Methylation of these genes may be important for lineage specification and should be considered aberrant in cancer when it occurs in a tumor type in whose corresponding normal tissue it is not hypermethylated. Class B-I genes (excluding
We found the percentage of Class A-II and Class B-II genes to be quite similar (7.53% and 6.61%), which suggests that the probability of aberrant hypermethylation or improper loss of methylation is similar in genes in which hypermethylation or loss of methylation, respectively, is necessary for lineage specification. However, the percentage of genes in Class A-I is much higher than that in Class B-I (24.63% and 1.41%), implying that it is much easier for a gene that is not naturally regulated by DNA methylation to become aberrantly hypermethylated than for there to be loss of methylation of developmental genes during hESC differentiation.
Comparing our DNA methylation data with those previously published on the histone modification profile and Polycomb occupancy of the same genes in embryonic stem cells
Within our four categories of genes we found those of Class A to be more enriched in Polycomb and bivalent marks than Class B genes, which suggests that the previously described scenario involving bivalent chromatin domains and Polycomb occupancy of cancer methylated genes in embryonic stem cells
To investigate further the role of promoter DNA methylation of genes aberrantly hypermethylated in cancer in hESCs, we compared the DNA methylation and expression status of four of the genes identified in the methylation arrays (MGMT and SLC5A8
By forcing the
One of the genes that we identified using this approach is
Finally, we wondered whether the methylation-dependent repression of cancer genes in hESCs is a molecular process associated with embryonic development or if, by contrast, it is an epigenetic mechanism involved in the maintenance of stemness status. The fact that the CD34+ somatic stem cell progenitors featured numerous genes frequently hypermethylated in cancer that are repressed by promoter hypermethylation suggests that, at least for these genes, the process could be associated with stemness status regardless of the ontogenetic stage of the cell, rather than being an event restricted to embryonic development. Since CD34+ cells are primary non-cultured cells, we can also discount the possibility that
By comparing the DNA methylation status of CD34+ progenitor cells with those of two types of primary cells that are terminally differentiated from the former (peripheral blood lymphocytes and neutrophils) we identified several genes that lost methylation specifically in just one of lineages. This, in conjunction with knowledge that most of the sequences identified were sometimes hypermethylated in NPTs and most were previously classified as Class B-II genes (those whose regulation by methylation is important for lineage specification and that present aberrant methylation in cancer), suggests that the genes hypermethylated in CD34+ progenitor cells that become unmethylated during differentiation are those primarily involved in lineage specification. That none of the sequences identified in the CD34+ progenitor cells was from Class B-I may well be because the CD34+ cells are not the primary hematopoietic progenitor cells and because Class B-I genes lose methylation in the transition from earlier progenitor stem cells to CD34+ cells. This explanation is consistent with the putative role of these genes in early development
The loss of promoter hypermethylation might be necessary for overexpression of a subset of Class B genes during differentiation. The aberrant process in cancer for these genes should be understood as a defect in establishing an unmethylated promoter during differentiation, rather than as an anomalous process of
Using the above approach, we identified two genes,
The results presented here are important for four reasons: i) we unexpectedly found a subset of cancer methylated genes that are also frequently methylated in hESCs; ii) the pattern of expression of these genes implies that DNA methylation might have an important role in the control of their expression in hESCs; iii) determining DNA methylation status in hESCs allowed us to define two categories of cancer methylated genes: Class A, containing genes that are never hypermethylated in hESCs, and Class B, containing genes that are frequently hypermethylated in hESCs; and, probably most important, iv) the hypermethylation of some Class B genes in adult stem cells
Cell pellets and/or DNA/RNA were obtained from the following laboratories: Shef-1 (Servicio de Inmunologia, HUCA, Oviedo, Spain), Shef-4, Shef-5, Shef-7, H7, H14 (CSCB, University of Sheffield, Sheffield, UK), H181 (CABIMER, Seville, Spain), I3 (Institute of Reconstructive Neurobiology, University of Bonn, Germany), and cultured and passaged following established protocols by each laboratory. The laboratories that were involved in the establishment and maintenance of these cell lines are members of the European project ESTOOLS (LSHG-CT-2006-018739). The laboratories participating in ESTOOLS only use embryonic stem cell lines derived from IVF embryos that will not be transferred into the womb. These embryos were donated for research according to the legal requirements of the country of origin. All donors gave their written informed consent. Profiling epigenetic regulation in hESCs is one of the research objectives of the ESTOOLS research program, which is supervised by the ethics advisory panel of the ESTOOLS project. The cell lines were established from different embryos and were maintained under different conditions, thereby ensuring the independence of our results for type of line and culture conditions.
Primary CD34+ hematopoietic somatic stem cells were purified from cord blood (CB) samples obtained from healthy newborns upon progenitor's informed consent. CB harvesting procedures and informed consents were approved by the Local Hospital Ethics Board. Mononuclear cells were isolated using Ficoll-Hypaque (Amersham Biosciences, Baie d'Urfé, Quebec, Ontario, Canada). CD34+ cells were purified by positive selection using anti-CD34 microbeads (Miltenyi Biotech, Madrid, Spain). Immunomagnetic CD34+ cell-containing cell suspensions were passed through Pro-MACS immunomagnetic columns (Miltenyi Biotech). The flow-through contained the purified CD34+ fraction. The purity was 80% ± 12% (n = 2) (
MDA-MB-231, Hela, CasKi, SiHa, HCC1937, BT-474, LoVo, HCT115, DLD1, Co115, HT29, SW48, HCT116, RKO, U937, HL60, AKATA, Raji, Ramos, Karpas, and Farage (ATCC) cell lines were maintained in DMEM medium supplemented with 10% FBS and grown at 37°C under 5% CO2.
Lymphocytes and neutrophils were separated from peripheral blood of healthy volunteers, by centrifugation, using Histopaque®-1077 (SIGMA). Lymphocyte-enriched fractions were obtained by collecting the upper pillow of mononuclear cells and granulocytes (mainly neutrophils) following hemolysis of the remaining pellet. RNA from breast, liver, heart, muscle, lung, colon, and lymph node samples were obtained from Ambion (Austin, TX). DNA from breast, heart, brain, and muscle was obtained from Biochain (Hayward, CA). The subjects who participated in this study gave written consent to being subjected to the procedures.
Methylation was assessed at 1,505 CpG sites using Illumina Goldengate Methylation Arrays©, as described in Bibikova
To identify gene promoters that could be hypermethylated in a significant number of samples of a particular group (human embryonic stem cells, normal tissue types, and CCLs), we selected all sequences whose hybridization signal was ≥0.7 in at least 25% of the samples of each group. In general, sequences were classified by the following stepwise algorithm: First, sequences were classified according to the percentage of hESCs hypermethylated in each specific probe set. Therefore, sequences that were hypermethylated in ≥25% and <25% of samples were considered to hypermethylated and unhypermethylated, respectively. Sequences were then tested for hypermethylation in hCCLs and classified according to the percentage (≥25% or <25%) of hypermethylated samples in each probe set. Finally, the percentages of normal tissue types that were hypermethylated in each probe set were calculated, and sequences were classified as hypermethylated in all normal tissue types (100% of samples with signal ≥0.7), unmethylated in all normal tissue types (100% of samples with signal <0.3) or unmethylated in some of the samples but not in all samples (signal<0.3 in at least one, but not all, samples). This algorithm allowed most sequences in the array to be assigned to one of the 12 groups described in
We next determined whether any of the groups was significantly enriched in a specific type of histone modification. For this reason, all sequences were classified according to publicly available data on histone-modification and Polycomb occupancy
DNA methylation was determined by PCR analysis after bisulfite modification of the DNA. Bisulfite genomic sequencing was carried out as previously described
RNA was isolated with TRIzol Reagent (Invitrogen) according to the manufacturer's instructions. For RT-PCR, 1 μg of total RNA was reverse-transcribed using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Quantitative real-time RT-PCR was performed using TaqMan® Gene Expression Assays and the ABI PRISM® 7900 sequence-detection system (Applied Biosystems). Data are expressed as means ± SD of three replicates of each experiment.
Unsupervised cluster analysis of human embryonic stem cells (hESCs), human cancer cell lines (CCLs), and normal primary tissues based on correlation of methylation profiles of 1,421 sequences. The methylation levels vary from fully methylated (red) to fully unmethylated (white) sequences. The final two rows correspond to in vitro-methylated DNA (IVD), used as a positive control for methylation.
(0.20 MB PDF)
Methylation status of MGMT in hESCs, normal tissues, and CCLs. (A) Methylation profiles of MGMT gene obtained by Illumina arrays and expressed as relative methylation from fully unmethylated (0) to fully methylated (1). (B) Bisulfite genomic sequencing of multiple clones of the MGMT promoter in hESCs and normal primary tissues. Color code as for
(0.04 MB PDF)
Hypermethylation of SLC5A8 in hESCs. (A) Methylation profiles of SLC5A8 gene obtained by Illumina arrays and expressed as relative methylation, from fully unmethylated (0) to fully methylated (1). (B) Bisulfite genomic sequencing of multiple clones of the SLC5A8 promoter in hESCs and normal primary tissues. Color code as for
(0.03 MB PDF)
Hypermethylation of PYCARD in hESCs. (A) Methylation profiles of PYCARD gene obtained by Illumina arrays and expressed as relative methylation from fully unmethylated (0) to fully methylated (1). (B) Bisulfite genomic sequencing of multiple clones of the PYCARD promoter in hESCs and normal primary tissues. Color code as for
(0.03 MB PDF)
Hypermethylation of RUNX3 in hESCs. (A) Methylation profiles of RUNX3 gene obtained by Illumina arrays and expressed as relative methylation from fully unmethylated (0) to fully methylated (1). Red arrow indicates methylation levels in normal lymphocytes purified from blood. (B) Bisulfite genomic sequencing of multiple clones of the RUNX3 promoter in hESCs and normal primary tissues. Color code as for
(0.02 MB PDF)
Hypermethylation of DLC1 in hESCs. Methylation profiles of DLC1 gene obtained by Illumina arrays and expressed as relative methylation, from fully unmethylated (0) to fully methylated (1).
(0.02 MB PDF)
Flow cytometry analysis of the purity of CD34+ cells after purification by positive selection using anti-CD34 microbeads. Detection signals were obtained using a fluorochrome-conjugated anti-CD34 antibody (BD). Purity was 80% ± 12% (n = 2).
(0.01 MB PDF)
Hypermethylation of AIM2 in hESCs. Methylation profiles of AIM2 gene obtained by Illumina arrays and expressed as relative methylation, from fully unmethylated (0) to fully methylated (1).
(0.02 MB PDF)
List of genes belonging to each group defined in Supplementary
(1.20 MB XLS)
List of genes identified as being hypermethylated in hESCs using different classification thresholds than that used in
(0.32 MB XLS)
Methylation data, histone marks, and Polycomb occupancy for genes in the four main categories: A-I, A-II, B-I, and B-II. Raw data from the methylation array for each sample are included. The final three columns summarize information about histone marks and Polycomb occupation published elsewhere. In the HK4/K27 methylation column, K4 stands for 3me-lysine 4 of histone H3, while K27 stands for 3me-lysine 27 of histone H3. In the Polycomb occupation column (+) and (−) respectively refer to the presence and absence of the protein SUZ12.
(0.89 MB XLS)
Histone marks and Polycomb occupation in Class A-I, A-II, B-I, and B-II genes. The first table shows each group separately, the second shows Group A vs. group B, and the third Class I vs. Class II genes. The number of genes is presented with the probability of each modification on the right and the percentage on the left.
(0.04 MB XLS)
List of genes that are hypomethylated during in vitro differentiation of the embryonic stem cell line Shef-1. Methylation levels from the Illumina array are reported.
(0.02 MB XLS)
Genes that are hypermethylated in CD34+ hematopoietic stem cell progenitors. Methylation levels from the Illumina array are reported.
(0.39 MB XLS)
Genes that are hypomethylated in peripheral blood lymphocytes and neutrophils relative to CD34+ hematopoietic stem cell progenitors.
(0.02 MB XLS)
Summary of the gene ontology GO terms associated with the genes of Classes A-I, A-II, B-I, and B-II. The analysis was done using the web tool of the PANTHER database. Corresponding probabilities of each term and the chromatin-associated gene function (right), based on Zhao et al. (2007), are presented.
(0.02 MB XLS)
Primers used for bisulfite sequencing.
(0.02 MB XLS)
We gratefully acknowledge the Genotyping Unit at the CNIO for their assistance with the methylation arrays technology. The I3 cell line was originally derived by Joseph Itskovitz-Eldor.