Stefan Bertilsson is an editor of PLOS ONE. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
Contributed data: ESL JP KS PZ SB SP YY. Conceived and designed the experiments: AE. Analyzed the data: AE SD CR. Wrote the paper: AE SD SB ESL.
The recognition and discrimination of phytoplankton species is one of the foundations of freshwater biodiversity research and environmental monitoring. This step is frequently a bottleneck in the analytical chain from sampling to data analysis and subsequent environmental status evaluation. Here we present phytoplankton diversity data from 49 lakes including three seasonal surveys assessed by next generation sequencing (NGS) of 16S ribosomal RNA chloroplast and cyanobacterial gene amplicons and also compare part of these datasets with identification based on morphology. Direct comparison of NGS to microscopic data from three time-series showed that NGS was able to capture the seasonality in phytoplankton succession as observed by microscopy. Still, the PCR-based approach was only semi-quantitative, and detailed NGS and microscopy taxa lists had only low taxonomic correspondence. This is probably due to, both, methodological constraints and current discrepancies in taxonomic frameworks. Discrepancies included Euglenophyta and Heterokonta that were scarce in the NGS but frequently detected by microscopy and Cyanobacteria that were in general more abundant and classified with high resolution by NGS. A deep-branching taxonomically unclassified cluster was frequently detected by NGS but could not be linked to any group identified by microscopy. NGS derived phytoplankton composition differed significantly among lakes with different trophic status, showing that our approach can resolve phytoplankton communities at a level relevant for ecosystem management. The high reproducibility and potential for standardization and parallelization makes our NGS approach an excellent candidate for simultaneous monitoring of prokaryotic and eukaryotic phytoplankton in inland waters.
Phytoplankton are essential for biogeochemical cycles
So far, most studies on the diversity, distribution, and abundance of phytoplankton taxa have been based on morphological characteristics using different microscopic techniques. There are so far no studies on monitoring of combined phytoplankton communities (i.e. both cyanobacteria and eukaryotic algae) with molecular methods, but separate monitoring of eukaryotic phytoplankton communities have been attempted using single-strand conformation polymorphism and microarrays
In the aquatic environment, these new sequencing technologies have already been introduced in studies on the diversity of other organisms lacking morphological detail for identification e.g. bacteria
Here, we use the 16S rRNA gene as a marker as it is universal in prokaryotes including cyanobacteria and also universally present in the chloroplasts of eukaryotes. This enables simultaneous detection of prokaryotic and eukaryotic phytoplankton taxa. Using datasets based on 16S rRNA gene amplicons that have been sequenced by 454 pyrosequencing, we describe temporal patterns in three lakes and compare phytoplankton communities among an additional 46 lakes from temperate, boreal and polar regions. Our sequence-based data reveals that phytoplankton composition differ significantly among lakes with different trophic status showing that our approach can resolve phytoplankton communities and act as a tool for monitoring trophic status of aquatic systems. Our study illustrates the potential of DNA sequencing-based analyses as powerful tools in environmental monitoring by offering accurate, reliable and rapid identification of phytoplankton taxa from complex environmental samples.
Water samples were taken from a range of lakes of different nutrient content (including also some saline Antarctic lakes) as described previously for Erken (ER
Samples for assessment of phytoplankton abundance and biomass were preserved with Lugol's solution. This was done for time series data from AM, ER and RI. Phytoplankton were enumerated using inverted microscopes at 100–1000×magnification, after sedimentation of a known volume of sample in a counting chamber
Genomic DNA extraction from filters (0.2 µm) was performed using the Ultra clean Soil DNA extraction kit as recommended by the manufacturer (MoBio, Laboratories, Solana Beach, CA, USA). Except for lakes AM, MJ, N, VK and VM a modified protocol originally described by Griffiths et al. was used
Output from the sequencer in the form of SFF files together with a list of samples including their corresponding barcodes were used for the analyses. First, ambiguous sequences were removed from the data set including reads with low quality as inferred from their flowcharts and those that did not carry the exact primer sequence (reverse primer 805R)
To obtain a higher taxonomic resolution than provided by the classifier, a representative sequence from each OTU was aligned in MOTHUR
To assign phytoplankton reads into operational taxonomic units (OTUs) prior to ordination procedures, sequences were clustered based on 97% sequence similarity using UCLUST
All statistical analyses were conducted using R (
After quality filtering and preprocessing 1,116 833 reads were obtained from the 259 sequenced samples included in the study, whereof nine percent or a total of 89,982 reads could be assigned to cyanobacteria or chloroplasts (from this onwards termed phytoplankton). The sequencing effort was highly variable among the samples ranging from 106 to 32,832 total reads per sample. Heterotrophic bacteria usually occur in higher numbers than phytoplankton, which is reflected in the ratio between phytoplankton reads and the total number of reads. This ratio was on average 0.098 (range from 0 to 0.58) and a distribution as depicted in
The lines depict different ratios (phytoplankton reads∶total number of reads) and the points represent the samples.
Lake | Lake type | #samples | #reads | #phyto reads | #OTUs | #phyto OTUs | Longitude | Latitude | reference |
Lake Abraxas | antarctic | 2 | 27652 | 5884 | 393 | 35 | 78.3 | −68.5 | Logares et al. 2012 |
Ace Lake | antarctic | 1 | 31835 | 2121 | 2540 | 30 | 78.2 | −68.5 | Logares et al. 2012 |
Alinen Mustajarvi | dysotrophic | 19 | 65380 | 3822 | 2133 | 166 | 25.1 | 61.2 | Peura et al. 2012 |
Alstasjon | eutrophic | 1 | 15612 | 739 | 860 | 49 | 12.0 | 63.0 | Severin et al. |
Atvandtjarnen | oligotrophic | 1 | 30666 | 1693 | 1726 | 91 | 12.0 | 63.0 | Logue et al. 2012 |
Bodsjon | oligotrophic | 1 | 31330 | 1469 | 1748 | 119 | 15.4 | 62.8 | Logue et al. 2012 |
Bredsjon | mesotrophic | 1 | 3016 | 1226 | 225 | 94 | 13.9 | 61.8 | Severin et al. |
Bustadtjarnen | oligotrophic | 1 | 18987 | 661 | 278 | 72 | 12.7 | 63.6 | Logue et al. 2012 |
Crooked Lake | antarctic | 1 | 31333 | 1971 | 1246 | 23 | 78.2 | −68.6 | Logares et al. 2012 |
Digernastjarnen | oligotrophic | 1 | 5671 | 1951 | 246 | 104 | 12.7 | 63.6 | Logue et al. 2012 |
Lake Druzhby | antarctic | 1 | 2819 | 491 | 188 | 18 | 78.3 | −68.6 | Logares et al. 2012 |
Erken | mesotrophic | 49 | 75173 | 11050 | 2269 | 196 | 18.6 | 59.8 | Eiler et al. 2012 |
Fibysjon | mesotrophic | 1 | 15610 | 109 | 788 | 39 | 17.4 | 59.9 | Severin et al. |
Funbosjon | eutrophic | 1 | 30221 | 624 | 952 | 59 | 17.9 | 59.9 | Severin et al. |
Gravatjarnen | oligotrophic | 1 | 30391 | 1273 | 939 | 95 | 12.3 | 63.6 | Logue et al. 2012 |
Haggsjon | oligotrophic | 1 | 30334 | 1774 | 1330 | 97 | 12.7 | 63.5 | Logue et al. 2012 |
Hallastjarnen | oligotrophic | 1 | 7641 | 1621 | 265 | 93 | 12.6 | 63.5 | Logue et al. 2012 |
Lake Hand | antarctic | 1 | 31348 | 1702 | 968 | 25 | 78.3 | −68.6 | Logares et al. 2012 |
Hassellasjon | dysotrophic | 1 | 728 | 118 | 215 | 38 | 16.1 | 62.1 | Comte et al. |
Hensjon | oligotrophic | 1 | 7846 | 1604 | 190 | 85 | 15.1 | 56.5 | Logue et al. 2012 |
Highway Lake | antarctic | 1 | 30564 | 853 | 1024 | 18 | 78.2 | −68.5 | Logares et al. 2012 |
Holmtjarnen | oligotrophic | 1 | 25213 | 1589 | 1562 | 89 | 12.2 | 62.5 | Logue et al. 2012 |
Lang-Bjorsjon | oligotrophic | 1 | 31242 | 824 | 1096 | 83 | 12.3 | 63.6 | Logue et al. 2012 |
Langsjon | mesotrophic | 1 | 2857 | 311 | 386 | 40 | 17.6 | 60.1 | Severin et al. |
Lille Jonsvatn | oligotrophic | 1 | 1617 | 162 | 213 | 33 | 10.6 | 63.4 | Comte et al. |
Lotsjon | mesotrophic | 1 | 14967 | 226 | 867 | 23 | 18.0 | 59.9 | Severin et al. |
Marine Coastal site | antarctic | 1 | 10136 | 1175 | 223 | 30 | 77.9 | −68.6 | Logares et al. 2012 |
Lake McNeil | antarctic | 2 | 41432 | 4611 | 1171 | 33 | 78.4 | −68.5 | Logares et al. 2012 |
Medstugusjon | oligotrophic | 1 | 8025 | 2373 | 313 | 114 | 12.4 | 63.6 | Logue et al. 2012 |
Norrsjon | eutrophic | 1 | 3605 | 451 | 316 | 36 | 18.0 | 59.9 | Severin et al. |
Organic Lake | antarctic | 2 | 32895 | 2043 | 473 | 6 | 78.2 | −68.5 | Logares et al. 2012 |
Oster-Noren | oligotrophic | 1 | 24734 | 2192 | 262 | 90 | 12.8 | 63.4 | Logue et al. 2012 |
Ovre Langsjon | eutrophic | 1 | 3446 | 374 | 296 | 61 | 18.0 | 59.9 | Severin et al. |
Pendant Lake | antarctic | 2 | 18187 | 6662 | 591 | 43 | 78.2 | −68.5 | Logares et al. 2012 |
Ramsjon | mesotrophic | 1 | 2922 | 626 | 221 | 33 | 17.5 | 59.8 | Severin et al. |
Rimov | mesotrophic | 17 | 14894 | 3396 | 1958 | 203 | 14.5 | 48.8 | This study |
Rookery Lake | antarctic | 1 | 27224 | 438 | 1029 | 12 | 78.1 | −68.5 | Logares et al. 2012 |
Ryssjon | eutrophic | 1 | 3144 | 1147 | 339 | 93 | 17.2 | 59.8 | Severin et al. |
Lake Shield | antarctic | 1 | 666 | 350 | 132 | 14 | 78.3 | −68.5 | Logares et al. 2012 |
Siggeforasjon | dysotrophic | 1 | 1997 | 242 | 221 | 38 | 17.2 | 60.0 | Severin et al. |
Skalsvattnet | oligotrophic | 1 | 18177 | 1505 | 198 | 101 | 12.2 | 63.6 | Logue et al. 2012 |
Strandsjon | mesotrophic | 1 | 3251 | 1174 | 443 | 69 | 17.2 | 59.9 | Severin et al. |
Tannsjon | oligotrophic | 1 | 15094 | 1546 | 1933 | 89 | 12.7 | 63.4 | Logue et al. 2012 |
Valloxen | eutrophic | 1 | 2586 | 743 | 259 | 30 | 17.8 | 59.7 | Severin et al. |
Vaster-Noren | oligotrophic | 1 | 6967 | 1746 | 149 | 99 | 12.8 | 63.5 | Logue et al. 2012 |
Vereteno Lake | antarctic | 1 | 8358 | 1344 | 326 | 20 | 78.4 | −68.5 | Logares et al. 2012 |
Lake Watts | antarctic | 1 | 9292 | 467 | 194 | 18 | 78.2 | −68.6 | Logares et al. 2012 |
Lake Williams | antarctic | 2 | 9694 | 1234 | 294 | 22 | 78.2 | −68.5 | Logares et al. 2012 |
Zurich | mesotrophic | 4 | 1962 | 1118 | 401 | 35 | 8.8 | 47.2 | This study |
For each of these 139 samples, the average number of reads annotated as cyanobacteria and chloroplasts was 596. This is in the same range as the average number of cells counted and classified by microscopy (at least 500)
Inner ring indicates the similarity of sequences to the nt/nr database (NCBI) as determined by BLAST search. Outer ring (bars) indicates the number of reads assigned to each node when using the resampled dataset (100 reads); note that nodes where all reads were removed by resampling are still given. Colored branches indicate group assignments from Bayesian classifier against a phytoplankton database.
To obtain the position of the USC reads in a phylogenetic framework, sequences were aligned and inserted into the SILVA106 phylogenetic tree. This analysis showed that the USC sequences form a deeply-branching sequence cluster and fall outside previously characterized entries (see
Among the lakes, cyanobacterial reads dominated in samples from eutrophic systems (45.5%) and were also abundant in oligotrophic lakes (36.0%), while these lakes also featured a high proportion of USC reads (43.4%). Other OTUs affiliated with the USC dominated in humic lakes (32.9%) and were accompanied by almost equal relative amounts of reads (approx 12%) annotated to Chlorophyta, Cryptophyta, Cyanobacteria and Heterokonta. In samples from mesotrophic lakes most reads were annotated to Heterokonta (30.1%), Cyanobacteria (23.5%) and Cryptophyta (22.0%). Analysis of phytoplankton community composition by ordination of NGS data confirmed the clear differences described above in phylum composition among systems (see
Stress value was 0.20. Permutational ANOVA confirmed visual inspection of significant differences in community composition between lakes of different status (p<0.001; R2 = 0.254).
antarctic | oligotrophic | mesotrophic | eutrophic | |||||||||
Fstats | R2 | p | Fstats | R2 | p | Fstats | R2 | p | Fstats | R2 | p | |
oligotrophic | 22.14 | 0.39 | >0.001 | |||||||||
mesotrophic | 17.36 | 0.16 | >0.001 | 13.62 | 0.13 | >0.001 | ||||||
eutrophic | 8.48 | 0.26 | >0.001 | 8.71 | 0.3 | >0.001 | 2.56 | 0.03 | >0.007 | |||
dysotrophic | 10.37 | 0.21 | >0.001 | 9.71 | 0.22 | >0.001 | 10.23 | 0.1 | >0.001 | 3.89 | 0.13 | >0.001 |
Seasonal dynamics were analyzed in three lakes using both NGS and microscopy. Samples with both microscopic and NGS data available were 14 for AM, 34 for ER and 16 for RI. Using microscopy the total number of taxa were 58 in AM, 84 in ER and 107 in RI (see
Statistical comparisons of seasonal phytoplankton dynamics in the three lakes (AM, ER, RI) by, on the one hand, cell abundance and biovolume data from microscopic counts and, on the other hand, NGS derived read numbers, revealed significant correspondence in the dynamics of community composition between the two methods, especially between microscopic abundance and NGS data. Here, both Procrustes superimposition and Mantel's test were significant (
Plots show the ratio between relative reads numbers and biovolumes (as determined by microscopy) for each phylum. (AM) Alinen Mustajarvi, (ER) Lake Erken, and (RI) Rimov Reservoir. A ratio above zero indicates that a specific phylum is preferentially detected by NGS whereas a ratio below zero indicates an over representation in the biovolume data relative to NGS. The part of the plot indicated in grey represents the area where the ratio is the result of that a phylum was only detected by either method.
mantel's test | procrustes superimposition | |||
Testing 454 data against | R | p | R | p |
AM biovolumes | 0.259 | <0.013 | 0.851 | <0.012 |
AM abundances | 0.26 | <0.007 | 0.89 | <0.005 |
ER biovolumes | 0.268 | <0.001 | 0.756 | <0.001 |
ER abundances | 0.532 | <0.001 | 0.842 | <0.001 |
RI biovolumes | 0.083 | 0.289 | 0.617 | 0.371 |
RI abundances | 0.654 | <0.001 | 0.922 | <0.001 |
Looking at the dynamics in greater detail revealed further discrepancies but also correspondence between microscopy and NGS data. In AM, high abundance of Cryptophyta belonging to the genus
Colors indicate the abundance of each taxon at each time point in relation to its maximum abundance in the respective time-series.
For ER, the NGS data showed that the succession started with a
For RI, the peak of
Phytoplankton as primary producers, are directly using nutrients as a resource and are therefore early responders to environmental changes, making them especially suitable as eutrophication indicators. Our massive NGS dataset from 46 lakes revealed a clear separation of the phytoplankton communities from lakes of different trophy suggesting that this metric has potential as a tool for water quality status assessments. Thus, providing the means to efficiently monitor one of the main environmental problems in surface waters; eutrophication. Picophytoplankton are particularly useful as early indicators of increase in phosphorus concentration
Rarefaction curves clearly show that our sampling efforts only scratched the surface of the phytoplankton diversity present in most studied systems. Increasing sampling efforts can provide a deeper insight into these communities, but this is limited by the actual proportion of phytoplankton 16S rRNA genes in the total pool of amplified 16S rRNAs in a sample. As visualized in
The weaker correspondence of NGS data to microscopic biovolume estimates compared to abundances (
Moreover, we are in the middle of revising the phylogeny of many phytoplankton groups. For example in diatoms
Our analyses identified potential novel taxa and the lack of sequenced freshwater taxa in current databases. A BLASTn search revealed that more than 50% of the cyanobacteria and chloroplast reads in our dataset have no closely related neighbor (more than 97% similarity to a database entry) among 16S rRNA sequences from isolated phytoplankton strains (for more details see
Phylogenetic analysis also shows that taxonomic resolution provided by the 16S rRNA gene of chloroplasts can at best provides classification to the genus level. Another marker gene that has been used as a pre-marker for protists is the 18S rRNA gene
There is a need for improvement in environmental monitoring, both because of international regulations and because of public concern about blooms of toxic or nuisance algae and other environmental pressures. Our analyses suggest that NGS-based characterization of 16S rRNA genes hold great promise as tools for phytoplankton monitoring as it allows the simultaneous monitoring of bacteria and most eukaryotes with plastids in a high-throughput, reproducible and cost-efficient manner. Still, many challenges lay ahead before NGS based methods can be implemented in monitoring programs. Furthermore, NGS based approaches will of course only be semi-quantitative. Barcoding initiatives and thorough systematics using both genetic and morphological information will be required to improve sequence databases and existing taxonomic frameworks for tracking phytoplankton groups and monitor phytoplankton communities by NGS facilitated approaches. The use of alternative marker genes but also multiplexing need to be explored to improve taxonomic resolution. Most importantly, taxonomists and molecular biologists must come together and move the field forward to fully embrace and exploit NGS technologies for phytoplankton ecology and the quality management of inland waters.
Metadata of 259 lakes available to this study including 139 samples used in the analyses of this study.
(XLS)
List of taxa found from Lakes Alinen Mustajärvi (AM), Erken (ER) and Římov (RI) when analyzing phytoplankton samples by microscopy.
(DOC)
We are grateful to Ester Eckert, Thomas Posch, Ramiro Logares, Sami Taipale, Ina Severin, Jurg B. Logue and Jerome Comte for help with sampling, and for providing additional data for some of the lakes. We thank E. Zapomělová and Minna Hiltunen for counting phytoplankton in RI and AM, respectively. We also want to thank the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for help with data storage and analysis.