Conceived and designed the experiments: VAP SD MJM. Performed the experiments: VAP MW SD MJM. Analyzed the data: VAP JRW MJM. Contributed reagents/materials/analysis tools: MW SD JCA DCL. Wrote the paper: VAP JRW MJM.
SD has received research funding from Bristol Myers Squibb. This company played no role in the planning, conduct, or analysis of the experiments in this manuscript. This relationship does not alter the authors' adherence to PLoS ONE policies on sharing data and materials.
Effects of diet on the structure and function of gut microbial communities in newborn infants are poorly understood. High-resolution molecular studies are needed to definitively ascertain whether gut microbial communities are distinct in milk-fed and formula-fed infants.
Pyrosequencing-based whole transcriptome shotgun sequencing (RNA-seq) was used to evaluate community wide gut microbial gene expression in 21 day old neonatal piglets fed either with sow's milk (mother fed, MF; n = 4) or with artificial formula (formula fed, FF; n = 4). Microbial DNA and RNA were harvested from cecal contents for each animal. cDNA libraries and 16S rDNA amplicons were sequenced on the Roche 454 GS-FLX Titanium system. Communities were similar at the level of phylum but were dissimilar at the level of genus;
Abundant transcripts identified in this study likely contribute to a core microbial metatranscriptome in the distal intestine. Although microbial community gene expression was generally similar in the cecal contents of MF and FF neonatal piglets, several differentially abundant gene clusters were identified. Further investigations of gut microbial gene expression will contribute to a better understanding of normal and abnormal enteric microbiology in animals and humans.
The principles that govern early microbial colonization in the mammalian intestine are poorly understood. Environmental exposures such as early infant diet are believed to impact the development and function of gut microbial consortia
Cultivation-independent techniques have made it possible to identify many or most of the gut microbes present within biological specimens such as fecal samples or gut mucosal biopsies
RNA-sequencing (RNA-seq)
Here, an unbiased RNA-Seq approach utilizing massively parallel pyrosequencing was used to study microbial gene expression in cecal contents from 21-day-old mother-fed (MF) and formula-fed (FF) neonatal piglets. Because nutrient availability is tightly linked to global transcriptional control and upregulation of bacterial virulence programs
Two parallel approaches were used to identify the microbial organisms present within cecal contents from the 8 piglets studied. We amplified and sequenced 16S ribosomal DNA sequences from the DNA samples, and we also characterized unamplified fragments of 16S rRNA sequences present within the whole genome cDNA libraries (
Mother-Fed | Formula-Fed | |||||||||
MF1 | MF2 | MF3 | MF4 | FF1 | FF2 | FF3 | FF4 | Total | Average | |
Total cDNA sequences | 108,960 | 87,406 | 87,257 | 85,377 | 93,935 | 93,073 | 97,970 | 113,446 | 767,424 | 95,928 |
Average read length (bp) | 376 | 384 | 373 | 376 | 367 | 374 | 352 | 339 | n.a. | 367 |
Total library size (Mb) | 41 | 33.6 | 32.5 | 32.1 | 34.5 | 34.8 | 34.4 | 38.4 | 281.3 | 35.2 |
Total number of sequences <100 bp | 6606 | 4820 | 5757 | 5393 | 5891 | 5563 | 6456 | 8361 | 48847 | 6106 |
Total rRNA sequences screened | 70,724 | 62,765 | 55,630 | 58,592 | 70,178 | 81,279 | 83,962 | 96,333 | 579,463 | 72,433 |
Total non-ribosomal RNA sequences | 31,630 | 19,821 | 25,870 | 21,392 | 17,866 | 6,231 | 7,552 | 8,752 | 139,114 | 17,389 |
% of non ribosomal RNA sequences | 30.90 | 24.00 | 31.74 | 26.75 | 20.29 | 7.12 | 8.25 | 8.33 | n.a. | 19.67 |
Total size of non-rRNA library (Mb) | 11.89 | 7.61 | 9.65 | 8.04 | 6.56 | 2.33 | 2.66 | 2.97 | 51.71 | 6.46 |
Total sequences matched to proteins in SEED subsystems | 6,208 | 7,097 | 4,007 | 3,017 | 9,199 | 2,284 | 2,858 | 2,430 | 37,100 | 4,638 |
% total sequences assigned to SEED subsystems | 5.7 | 8.1 | 4.6 | 3.5 | 9.8 | 2.5 | 2.9 | 2.1 | n.a. | 4.8 |
% non rRNA sequences assigned to SEED subsystems | 19.6 | 35.8 | 25.5 | 14.1 | 51.5 | 36.7 | 37.8 | 27.8 | n.a. | 31.1 |
Total sequences matched to COG database | 6,361 | 7,140 | 4,068 | 3,044 | 8,708 | 2,382 | 2,827 | 2,507 | 37,037 | 4,630 |
% total sequences assigned to COG | 5.84 | 8.17 | 4.66 | 3.57 | 9.27 | 2.56 | 2.89 | 2.21 | n.a. | 4.90 |
% non rRNA sequences assigned to COGs | 20.11 | 36.02 | 15.72 | 14.23 | 48.74 | 38.23 | 37.43 | 28.64 | n.a. | 29.90 |
Total non-ribosomal sequences annotated to level of phylum | 4,824 | 5,266 | 3,236 | 2,172 | 10,079 | 2,002 | 2,518 | 2,864 | 32,961 | 4,120 |
% non rRNA sequences annotated to level of phylum | 15.3 | 26.6 | 12.5 | 10.2 | 56.4 | 32.1 | 33.3 | 32.7 | n.a. | 27.4 |
Total 16S rDNA amplicon sequences >100 bp | 756 | 760 | 990 | 634 | 860 | 578 | 767 | 734 | 6,079 | 760 |
Average amplicon sequence length (bp) | 437.7 | 431.8 | 446.6 | 445.1 | 413.7 | 415.6 | 458.8 | 469.3 | n.a. | 439.8 |
Analysis of 16S rDNA amplicons demonstrated the presence of microbes from 11 total phyla. However, 98% of 16S rDNA amplicons classified at the level of phylum were derived from the phyla Bacteroidetes (56.8% of total), Firmicutes (36.6%), and Proteobacteria (4.8%). At the level of phylum, no taxa were differentially abundant within either the MF or FF group. However, several genera were differentially abundant in one of the two animal groups (
(A) Mean relative abundances of bacterial taxa within cecal microbiota from 4 MF animals (
The sequenced cDNA libraries (after one cycle of mRNA enrichment) contained 80.6% ribosomal RNA sequences, and taxonomic assignments were successfully made for 4.0% of these sequences (25,398 total sequences) using the Ribosomal Database Project classification algorithm
While increasing attention has been paid recently to “core” indispensable genes and organisms that are present within mammalian gut microbial communities
Consistent with other studies of gene expression within marine and soil microbial consortia
(A) Mean relative abundances of annotated sequences within cDNA libraries from all 8 animals studied. Displayed are the automated SEED Level 1 Subsystem assignments, as determined by MG-RAST
SEED Annotations | COG hits | |||||
Rank | SEED Level 3 Subsystem | Raw number of hits | Rank | COG | Annotation | Raw number of hits |
1 | Ribosome_LSU_bacterial | 1725 | 1 | COG0057 | Glyceraldehyde-3-phosphate dehydrogenase/erythrose-4-phosphate dehydrogenase | 620 |
2 | Ribosome_SSU_bacterial | 1226 | 2 | COG0050 | GTPases - translation elongation factors | 593 |
3 | Universal_GTPases | 1149 | 3 | COG1592 | Rubrerythrin | 544 |
4 | Oxidative_stress | 943 | 4 | COG0480 | Translation elongation factors (GTPases) | 525 |
5 | Pyruvate_metabolism_I:_anaplerotic_reactions__PEP | 814 | 5 | COG0662 | Mannose-6-phosphate isomerase | 368 |
6 | tRNA_aminoacylation | 786 | 6 | COG0448 | ADP-glucose pyrophosphorylase | 306 |
7 | Sialic_Acid_Metabolism | 778 | 7 | COG0574 | Phosphoenolpyruvate synthase/pyruvate phosphate dikinase | 302 |
8 | Entner-Doudoroff_Pathway | 673 | 8 | COG0334 | Glutamate dehydrogenase/leucine dehydrogenase | 302 |
9 | Maltose_and_Maltodextrin_Utilization | 672 | 9 | COG0191 | Fructose/tagatose bisphosphate aldolase | 290 |
10 | Mannose_Metabolism | 621 | 10 | COG0542 | ATPases with chaperone activity, ATP-binding subunit | 280 |
11 | Ton_and_Tol_transport_systems | 621 | 11 | COG1653 | ABC-type sugar transport system, periplasmic component | 236 |
12 | Sucrose_Metabolism | 600 | 12 | COG0674 | Pyruvate∶ferredoxin oxidoreductase and related 2-oxoacid∶ferredoxin oxidoreductases, alpha subunit | 232 |
13 | Ribosome_activity_modulation | 593 | 13 | COG0443 | Molecular chaperone | 229 |
14 | Lactose_and_Galactose_Uptake_and_Utilization | 564 | 14 | COG1544 | Ribosome-associated protein Y (PSrp-1) | 227 |
15 | Pyridoxin_(Vitamin_B6)_Biosynthesis | 548 | 15 | COG1109 | Phosphomannomutase | 214 |
The table lists the most heavily represented SEED Level 3 Subsystems (
Carbohydrate utilization profiles (
(A) Relative abundance of cDNA sequences assigned by MG-RAST to the Level 3 SEED Subsystem of carbohydrate utilization. (B) Relative abundance of cDNA sequences assigned by MG-RAST to the Level 3 SEED Subsystem of virulence. Values for mean relative abundances in both (A) and (B) reflect average values across all 8 animals studied.
Transcripts associated with virulence, stress response, and cell wall metabolism were also relatively abundant across all animals (mean relative abundances 0.055±0.007, 0.047±0.01, and 0.044±0.01, respectively). The most abundant sequences classified within the SEED subsystem of virulence were associated with Ton and Tol transport systems, fluoroquinolone resistance, and other iron transport systems (
Several of the commonly expressed microbial genes in the piglet cecum have crossover functions linking their primary function with binding to host epithelium, lipopolysaccharide (LPS) metabolism, oxidative stress response, and virulence. GAPDH, beta-galactosidase, enolase (COG0148; central carbohydrate metabolism), phosphoglycerate mutase (COG0588; central carbohydrate metabolism) and elongation factor Tu (COG 0050) have all been noted to mediate binding to host mucins, fibrinogen, and/or plasminogen
An advantage of digital studies of the community transcriptome is that the data can inform explanations of how metabolic tasks are divided among the members of gut microbial communities. Little is currently known about whether fundamental processes such as carbohydrate utilization are performed by a range of organisms in vivo or rather by a limited group of specialized community members. Furthermore, although it is widely recognized that numerous low abundance bacterial species reside in the GI tract, little is known about the contribution of such organisms to community metabolism and function. To begin to address these questions, taxonomic assignments were made, where possible, for each gene transcript by identifying the best BLASTN hit against an in-house database of 1054 finished microbial genomes (
(A) Mean relative abundances of bacterial taxa within cecal microbiota from 4 MF animals (
These data highlight the importance of available reference microbial genomes when attempting to profile mixed population microbial communities
Gene expression among gut microbes was relatively similar in MF and FF animals, despite taxonomic differences at the level of genus in the observed microbial communities. MG-RAST generated assignments of protein-coding nucleotide sequences to 29 broad functional categories of SEED subsystems and 637 narrow SEED subsystems (Level 1 and Level 3, respectively). In 24 of the 29 major subsystems, the abundance of assigned sequences was not statistically different in the MF and FF groups. Sequences assigned to the Level 1 subsystem of carbohydrates were similarly abundant in the two groups (p = 0.129). Remarkably, the abundance of MF and FF sequences was not significantly different in 94% of Level 3 Subsystems (600 of 637).
Significant differences were observed in five Level 1 SEED subsystems: prophage (p = 0.005), amino acids and derivatives (p = 0.009), potassium metabolism (p = 0.017), respiration (p = 0.018), and motility and chemotaxis (p = 0.04) subsystems.
Mother-Fed | Formula-Fed | ||||
Mean relative abundance | Standard error | Mean relative abundance | Standard error | p value | |
Prophage | 0.0001 | 0.0001 | 0.0004 | 0.0000 | 0.005 |
Amino_Acids_and_Derivatives | 0.0741 | 0.0029 | 0.0611 | 0.0022 | 0.009 |
Potassium_metabolism | 0.0037 | 0.0004 | 0.0018 | 0.0005 | 0.017 |
Respiration | 0.0361 | 0.0013 | 0.0423 | 0.0018 | 0.018 |
Motility_and_Chemotaxis | 0.0146 | 0.0032 | 0.0065 | 0.0008 | 0.040 |
Arginine_Biosynthesis | 0.0065 | 0.0007 | 0.0020 | 0.0003 | 0.001 |
LOS_core_oligosaccharide_biosynthesis | 0.0011 | 0.0001 | 0.0003 | 0.0001 | 0.003 |
Aromatic_amino_acid_interconversions_with_aryl_acids | 0.0008 | 0.0002 | 0.0020 | 0.0002 | 0.004 |
High_affinity_phosphate_transporter | 0.0006 | 0.0000 | 0.0004 | 0.0000 | 0.004 |
Bacterial_Chemotaxis | 0.0057 | 0.0006 | 0.0026 | 0.0005 | 0.005 |
Proline__4-hydroxyproline_uptake_and_utilization | 0.0116 | 0.0005 | 0.0081 | 0.0007 | 0.007 |
Mannitol_Utilization | 0.0005 | 0.0001 | 0.0012 | 0.0001 | 0.008 |
L-Arabinose_utilization | 0.0015 | 0.0003 | 0.0034 | 0.0005 | 0.013 |
dTDP-rhamnose_synthesis | 0.0015 | 0.0001 | 0.0007 | 0.0002 | 0.013 |
Aromatic_Amin_Catabolism | 0.0008 | 0.0002 | 0.0002 | 0.0001 | 0.015 |
Glycine_and_Serine_Utilization | 0.0037 | 0.0005 | 0.0056 | 0.0005 | 0.016 |
Methanogenesis | 0.0029 | 0.0009 | 0.0002 | 0.0002 | 0.017 |
Glycine_cleavage_system | 0.0023 | 0.0005 | 0.0006 | 0.0004 | 0.020 |
Aromatic_amino_acid_degradation | 0.0009 | 0.0001 | 0.0003 | 0.0002 | 0.020 |
Alanine_biosynthesis | 0.0011 | 0.0003 | 0.0003 | 0.0001 | 0.024 |
Purine_Utilization | 0.0015 | 0.0003 | 0.0048 | 0.0012 | 0.026 |
Ton_and_Tol_transport_systems | 0.0095 | 0.0012 | 0.0196 | 0.0034 | 0.027 |
Phosphoenolpyruvate_phosphomutase | 0.0000 | 0.0000 | 0.0006 | 0.0002 | 0.028 |
Terminal_cytochrome_oxidases | 0.0006 | 0.0002 | 0.0015 | 0.0002 | 0.029 |
Oxidative_stress | 0.0267 | 0.0042 | 0.0140 | 0.0021 | 0.029 |
Relative abundance of transcripts represents the number of sequences assigned to a subsystem for an individual animal divided by the total number of sequences for that animal. Mean relative abundance represents the average of these values in either the MF or FF treatment groups. A treatment group was considered to be enriched in transcripts assigned to a SEED subsystem if the p value was less than 0.05 for Level 1 subsystems and less than 0.03 for Level 3 subsystem. Subsystems with a mean relative abundance less than 0.0005 in both MF and FF groups were excluded. The table includes only a partial list of differentially abundant subsystems; a complete table is provided in
Significant differences between the MF and FF groups were evident in the abundance of transcripts related to amino acid metabolism. The MF group was markedly enriched in sequences encoding for enzymes that contribute to arginine metabolism, e.g. arginine deaminase (COG2235; p = 0.001) and ornithine aminotransferase (COG4992; p = 0.003). In neonates, arginine is synthesized in the gut from proline
The MF data set was enriched with sequences assigned to the oxidative stress subsystem (p = 0.029). This may be consistent with long-held claims of the antioxidant properties of maternal milk in human neonates
It is now well established that gut microbes contribute to mammalian physiology
Here we have presented a metatranscriptomic evaluation of intestinal bacteria in 8 neonatal piglets. In this study, we used an RNA pyrosequencing platform to profile gut microbial gene expression in MF and FF piglets without prior knowledge of which organisms were present. To our knowledge, this study represents the largest number of independent samples (8 subjects) used to date for analysis of community wide gene expression in the gut. Several methodologic considerations of this study warrant mention. First, we performed a single step for mRNA enrichment prior to construction of cDNA libraries, and cDNA sequencing results indicated that the degree of enrichment was modest (19.7% of all RNA sequences were non-ribosomal). In the future, this step in sample processing may not be necessary given the volume and length of sequencing reads available with current sequencing platforms. Second, amplification of cDNA sequences was not necessary in this study, although it has been required in prior metatranscriptomic studies due to low yields of RNA
A subset of specific transcripts was relatively abundant in all samples studied, indicating that we have begun to define a core neonatal gut microbial metatranscriptome. As expected, a preponderance of mRNA transcripts corresponded to genes related to metabolism of carbohydrates and proteins. These results align well with recent papers that have defined the nature of carbohydrate utilization genes within a core metagenome (9,31,32). Additionally, commonly observed gut microbial transcripts in our study encode for proteins that enable binding to host epithelium, regulate processing of extracellular polysaccharides, and mediate microbial stress response. The validity of our results is supported by a recent proteomic-based study that characterized circulating antibodies against gut bacteria in human subjects
A primary goal of this study was to identify differences in the gut microbial communities of MF and FF neonatal piglets. MF samples were enriched with 16S sequences from the taxa
Profiles of gene expression were similar. The abundance of sequences from more than 90% of SEED subsystems and COG clusters did not differ between MF and FF datasets was not statistically different. These results suggest that sow's milk and artificial formula, although chemically distinct, induce relatively subtle changes in the gut microbial gene expression. However, several important differences were noted in the transcriptomes of the MF and FF animals. Marked differences were noted in the expression of genes involved in amino acid metabolism. Interestingly, we observed a clear abundance of enzymes linked to arginine metabolism in the MF group of animals. This finding may have clinical relevance because several reports have indicated that an abnormally low serum concentration of arginine, a precursor for nitric oxide production, confers an increased risk for NEC
Recent advances in completing microbial genomes and completing metagenomic surveys have demonstrated the vast metabolic potential of the bacteria and archaea present within the mammalian intestine. As DNA-based studies continue, parallel studies of gut microbial function will be essential. Functional studies of gene expression, protein expression, and metabolite production will make it possible to define what is “normal” in the field of enteric microbiology. Eventually, continued progress in this area will allow us to better understand the contributions of microbes to diseases such as NEC, Crohn's disease, and obesity.
Animals were managed throughout the study in accordance with requirements of the Institutional Animal Care and Use Committee at the University of Illinois (IACUC protocols 08070 and 08015), in accordance with approved NIH guidelines. Vaginally-delivered piglets were allowed to suckle for 48 h postnatal to obtain colostrum. Four piglets (mother fed, MF) from a single litter remained with the sow throughout the study. Four additional piglets (formula fed, FF), from two separate litters, were fed with formula for the remainder of the study. FF piglets were transported to the animal facilities and housed individually in metabolism cages with 12 h light/dark cycle as previously described
Samples were collected on postnatal day 21. Piglets were first sedated with an intramuscular injection of Telazol (7 mg/kg body weight; Fort Dodge Animal Health, Fort Dodge, IA) and then euthanized by intracardiac administration of sodium pentobarbital (Fatal Plus: 72 mg/kg body weight; Vortech Pharmaceuticals, Dearborn, MI). The large intestine was excised and separated into cecum and colon at the cecocolic junction. Tissues and luminal contents from cecum were immediately collected into sterile tubes and snap frozen in liquid nitrogen. The samples were stored at −80°C until processing.
DNA was isolated from frozen cecal contents according to QIAamp DNA Stool mini Kit (Qiagen, Alameda, CA) with modifications
The frozen ceca were cut in discs ∼0.5–1 cm of thickness using a cold sterile pruner. Residual intestinal tissue was removed with the pruner and a handheld drill (Dremel, Racine, WI). The samples were kept deeply frozen during the procedure. Total microbial RNA isolation was performed using the RiboPure-Bacteria Kit (Ambion, Austin, TX). The MICROBExpress Bacterial mRNA Purification Kit (Ambion) was used to deplete the pool of 16S and 23S rRNA molecules present in the sample. cDNA libraries were constructed according to random primer cDNA synthesis protocol implemented in Double-Stranded cDNA Synthesis Kit (Invitrogen, Carlsbad, CA). Libraries were size selected in 1% low melting point agarose gel. The region of 250–800bp was removed from the gel and purified. Approximately 1µg of randomly-primed, size selected cDNAs were blunt-ended, adaptor ligated and converted to a single-stranded template DNA library using the GS Titanium General Library Prep Kit (Roche Applied Science, Indianapolis, IN). Libraries were prepared using barcode-containing adaptors in place of the standard Titanium adaptors, following Roche's instructions for preparation barcoded adaptors. The barcode sequences used for each library are listed above (MID 1–8). Libraries were quantified using Qubit reagents (Invitrogen, CA) and average fragment sizes were determined by analyzing 1 µl of the sized cDNA samples on the Bioanalyzer (Agilent, CA) using a DNA 7500 chip. The libraries were pooled in equimolar concentration into a single library. Processing for emulsion PCR, titration and sequencing on a GS FLX was done following the manufacturer's protocols (Roche Applied Science) on GS Titanium 70×75 picotiter plate. The resulting library reads were sorted by barcode using SFF software tools (Roche Applied Science).
A total pool of initial 454 pyrosequencing reads was first screened for length (>100 bp) and rRNA contamination. Replicate reads were not removed, as a preliminary analysis of COG hits and taxonomic assignments indicated that roughly 1% of sequences were perfect replicates and that their presence did not significantly affect the relative abundance of transcripts (data not shown). Sequences were searched against an in-house database of 18,283 annotated long subunit (LSU) and short subunit (SSU) rRNA sequences compiled from the SILVA database
Statistical tests for differentially abundant COG functions, cDNA-based taxonomic annotations, and 16S-based taxonomic annotations between populations (e.g. 21-day MF, 21-day FF) were made using the Metastats methodology with 1000 permutations to compute nonparametric p-values. Thresholds for significance were determined according to the specific comparison. For general COG categories and Level I and II SEED subsystems, the p-value threshold was 0.05. For individual COGs and level III SEED subsystems, the significance thresholds were 0.02 and 0.03, respectively.
Principal component analysis of genus-level taxonomic assignments in MF and FF animals. MF animals (n = 4) represented by blue dots; FF animals (n = 4) represented by black dots.
(0.21 MB PDF)
Gut microbial community structure in MF and FF piglets. Relative abundance of organisms was determined by amplification and sequencing of 16S rDNA present within microbial DNA isolated from the cecal contents of each animal. Relative abundance of sequences represents the number of sequences assigned to a given taxa for an individual animal divided by the total number of sequences for that animal. Mean relative abundance represents the average of these values in either the MF or FF groups.
(0.03 MB XLS)
In-house collection of bacterial genomes used for taxonomic assigment of cDNA sequences.
(0.08 MB XLSX)
Complete list of differentially abundant transcripts assigned to SEED subsystems in MF and FF piglets. Relative abundance of transcripts represents the number of sequences assigned to a subsystem for an individual animal divided by the total number of sequences for that animal. Mean relative abundance represents the average of these values in either the MF or FF treatment groups. A treatment group was considered to be enriched in transcripts assigned to a SEED subsystem if the p value was less than 0.05 for Level 1 subsystems and less than 0.03 for Level 3 subsystem. Subsystems with a mean relative abundance less than 0.0005 in both MF and FF groups were excluded.
(0.03 MB XLS)
Differentially abundant COG hits in MF and FF piglets. Relative abundance is defined as the number of hits to an individual COG divided by the total number of hits to individual COGs for each individual subject. COGs were excluded if the mean relative abundance in both MF and FF groups was less than 0.0005. A treatment group was considered to be enriched in transcripts from an individual COG if the p value was less than 0.02.
(0.03 MB XLS)
The authors wish to thank Alvaro Hernandez and the staff of the W.M. Keck Center for Comparative and Functional Genomics at The University of Illinois (Urbana -Champaign) for sequencing support, the Initiative in Bioinformatics staff at the Computation Institute of The University of Chicago, and Vincent Denef, PhD for important insights regarding analysis of community gene expression data.