The authors have declared that no competing interests exist.
Conceived and designed the experiments: BS XS JX KN. Performed the experiments: BS XS. Analyzed the data: BS XS KN. Contributed reagents/materials/analysis tools: BS XS KN. Wrote the paper: BS XS JX KN.
The NGS (next generation sequencing)-based metagenomic data analysis is becoming the mainstream for the study of microbial communities. Faced with a large amount of data in metagenomic research, effective data visualization is important for scientists to effectively explore, interpret and manipulate such rich information. The visualization of the metagenomic data, especially multi-sample data, is one of the most critical challenges. The different data sample sources, sequencing approaches and heterogeneous data formats make robust and seamless data visualization difficult. Moreover, researchers have different focuses on metagenomic studies: taxonomical or functional, sample-centric or genome-centric, single sample or multiple samples, etc. However, current efforts in metagenomic data visualization cannot fulfill all of these needs, and it is extremely hard to organize all of these visualization effects in a systematic manner. An extendable, interactive visualization tool would be the method of choice to fulfill all of these visualization needs. In this paper, we have present MetaSee, an extendable toolbox that facilitates the interactive visualization of metagenomic samples of interests. The main components of MetaSee include: (I) a core visualization engine that is composed of different views for comparison of multiple samples: Global view, Phylogenetic view, Sample view and Taxa view, as well as link-out for more in-depth analysis; (II) front-end user interface with real metagenomic models that connect to the above core visualization engine and (III) open-source portal for the development of plug-ins for MetaSee. This integrative visualization tool not only provides the visualization effects, but also enables researchers to perform in-depth analysis of the metagenomic samples of interests. Moreover, its open-source portal allows for the design of plug-ins for MetaSee, which would facilitate the development of any additional visualization effects.
Microbes are everywhere around us on the planet, and the total number of microbial cells on earth is huge
Understanding the taxonomical structure of a microbial community (alpha diversity) and the differences in taxon among microbial communities (beta diversity) have been two of the most important problems in metagenomic research
Advances in sequencing technologies have equipped researchers with the ability to sequence collective genomes of entire microbial communities, commonly referred to as metagenome, in an inexpensive and high-throughput manner
Extensive collaborations between microbiologists and bioinformaticians are needed for large-scale metagenomic data analysis & interpretation. To facilitate their collaboration, an easy to use and cross-platform system for interactive visualization of metagenome is urgently needed.
Firstly, although many metagenomic data analysis tasks can be accomplished with automated processes, some steps continue to require human judgments and are frequently rate limiting, for example the comparison between two or more samples. Visualization can augment our ability to reason about complex data, and increase the efficiency of manual analyses. Given the importance of human interpretation, visualization tools also provide a valuable complement to automated computational techniques, particularly in the early hypothesis generation stages of biological research, enabling us to derive scientific insight from large-scale data sets
Secondly, researchers need high-quality figures to facilitate their in-depth analysis and interpretation of their work. And it is important for software designers continue providing scientists with tools that are useful, effective and illustrative
Finally, current metagenomic research is becoming a multi-region, multi-discipline and multi-expertise collaboration effort, with many researchers working on microbiology-related energy, medicine, environment, etc. The common theme for these researches is that the data are produced around the world but analyzed in a data analysis center. As such, the need for a cross-platform visualization toolbox to serve such collaborations is becoming more and more urgent.
Metagenomic samples are usually presented in a kind of diverse hierarchy, for which there are only a few of visualization tools designed. In addition, they need to be examined from different angles and levels: phylogeny information, taxonomical structure and functional structure. However, current visualization tools are limited by their abilities to show only one or two angles for the metagenomic taxonomical samples of interests.
Current metagenomic visualization tools could be categorized as independent or dependent (as a component in comprehensive software) by their dependencies on other software, or as open-source or closed-source by their software distribution strategy.
Based on the NCBI Taxonomy database, MEGAN
(A) The visualization result of MEGAN with taxonomy components of two metagenomic samples. (B) The visualization user interface of Strainer (The visualization of components of metagenomics in micro-perspective). The comparison between a component of a metagenome sample and a genome. (C) The visualization user interface of BLASTatla (The visualization of components of metagenomics in macro-perspective: The comparison among a component of metagenome and multiple genomes). (D) Contig and gene annotation visualization of IMG/M. (E) The visualization user interface of Krona. This is the user interface of comparison of four saliva microbiomes. (F) The metagenome visualization result of iTOL with default parameters.
Strainer
IMG/M
Krona
iTOL
There are several other visualization tools for metagenomic sample visualization
The foundation of most metagenomic studies is the assignment of observed nucleic acids to taxonomic or functional hierarchies
It should be noticed that some of the visualization effects required by users could not be easily realized on a single interface clearly. For example, it would not be clear to have multiple metagenome comparison results placed in the same page. For such visualization effects, a substantial integration and interactive visualization tools might be the method of choice.
In this work, we have developed an integrative visualization system, MetaSee, based on most advanced computer visualization techniques. The MetaSee system is composed of (1) the core visualization engine, (2) the front-end interactive analysis interface, and (3) the API portal for plugin development.
(1) The core visualization engine includes: visualization of the taxonomical structure of the metagenomic samples globally and at different levels, comparison of different metagenomic samples and link-out to different annotations for the taxa and/or functions, etc. (2) The front-end user interface is specifically designed for real metagenomic models (such as the oral microbial community models) that connect to the above core visualization engine. And (3) the open-source API portal for the development of plug-in is designed for easy-extension of the MetaSee system.
MetaSee is implemented based on all of the taxonomical and functional information that could be retrieved from metagenomic samples, and takes advantage of modern computer visualization technology, including HTML5 canvas, JavaScript, SVG and modern web browsers. The only requirement for viewing the result of MetaSee is an updated web browser, and the results can be viewed (online or off-line) on almost all operating systems (OS) with Graphical User Interface (GUI).
Visualization tools are particularly powerful when used in combination with high-throughput automated analysis software (e.g., Parallel-META
The core visualization engine is composed of multiple viewing components: the viewing components include (not exclusive of each other): overall framework (
Each pie stands for an element of MetaSee, and directed arrow stand for a front-end link from one component to another component.
The framework includes the left sidebar
(A) Left side bar for navigation, (B) Main window is the working area for visualization.
The MetaSee visualization panel is the main interactive operation panel of MetaSee, which is designed for interactive analysis of the structure of metagenome. This panel is a pie chart, and when a sector (representing a taxa) of it is selected, the area will be highlighted and turn to the right. The right side bar of MetaSee visualization panel will display the detailed information of this node and links to other views. The lengths of layers of these charts indicate which part of this dataset was classified more precisely. And the color of each sector indicates the abundance of this sector (taxa) (red color indicates more abundant taxa).
For each sample, a Global view is a hierarchical tree that contains every taxa and their proportion in the sample. Two or more samples can be shown in a single Global view, with each node composed of a bar-plot showing the relative abundance of different samples at those taxa. Thus, Global view shows the whole picture of all samples being compared. In Global view, all the taxonomy units at the same level are in the same rank, so it is easy to find which part of the input dataset was enriched (classified with more details). The heights of each pillar stand for the relative abundance of each sample at this taxonomy unit. The detail information of a certain taxonomy unit is linked from small bar chart to their Taxa view (a pair of pie-charts and a pair of bar-charts) with relative abundance, absolutely abundance and legend. In Global view each color indicates a sample (as indicated in figure legend), and it is convenient to find the difference among multiple samples at the global level of a certain taxonomy unit.
For one or a set of samples, the Taxa view focuses on the detail information of one node (taxa) in Global view, a taxonomical hierarchy tree structure (by clicking the bar-plot for that node). This detailed information includes the abundance information at the specific taxa, which is useful for comparing different samples for specific taxa. It can be shown in either pie-chart or bar-chart format.
For each sample, a Phylogenetic view is an unweighted phylogenetic tree. It elucidates the evolutionary relationship of all microbes in a microbiome community.
Unweighted phylogenetic tree file is presented in Newick format. It can also be imported into other phylogenetic tree visualization tool (e.g. Phylogenetic tree Maker (
For each sample, the taxonomical community structure is represented in a dynamic multi-layer pie-chart, so that each taxon’s (at each level) proportion can be vividly seen by interactively zoomed-in or zoomed-out. Moreover, pie-charts for multi-samples can be smoothly shifted from one to another for comparison of structure and proportion. The sample view is implemented by the Krona software
Each of the taxa or function could be linked-out to their annotation from external sources from MetaSee visualization panel, Global view (by clicking the name of that node), Sample view or Taxa view. Here we use the taxonomy browser database of NCBI
Metagenomic data are often generated at discrete points across multiple locations or times. MetaSee is able to store the data from multiple samples in a single framework. Individual samples may then be stepped through. Thus, it makes the comparison among samples coming from different time points or conditions easy (
(A) Global view, (B) Taxa view with pie-chart format, (C) Taxa view with bar-chart format.
The front-end interface mainly serves for a set of real metagenome projects based on MetaSee visualization system. Two areas may need this metagenomic visualization system: dentistry and field experimental studies. For dentists, this system would help them for quick diagnosis by using the novel samples that have been collected as queries to search in the database of known samples of microbial communities. This has been proven to be workable for dentists so far
Open-source portals were designed to extend the usability of the MetaSee visualization system. Firstly, community structure files in many formats can be imported into MetaSee. As XML is easy to expand, it was selected as the default format. Yet, during run time, community structure files in many formats could be stored in random access memory (RAM) as double linked trees, by an independent component for tree building. Based on this design model, it is very easy to develop other APIs for new input file formats. As examples, we developed APIs for importing output files from parallel-META
Secondly, the work flow of MetaSee could build a tree structure and then output this tree to a variety of views. Therefore, adding new APIs for other views (such as back-to-back sample views) or modifying existing views would be facilitated.
Thirdly, the search function of the MetaSee toolbox (
Finally, we have established a repository (
The online version (
Additionally, stand-alone MetaSee application could be downloaded as a virtual machine, which was developed in Java, and can run almost on all OS using both GUI (
Firstly, select the format of input data with the drop-down list. Secondly, click the “input file” to select input file, multiple files can be accepted, but these files should be in uniform format. Thirdly, press the “output folder” button to assign the output path. Finally, press “submit” button to run MetaSee.
We released our source codes and development documents. With these documents, example source codes, sample data and our discussion group (
To evaluate the ability of metagenome visualization, we compared MetaSee with other metagenomic visualization tools based on 450 metagenomic samples
MetaSee is designed to augment our ability to reason about complex data. In Framework and Global view, MetaSee ranked nodes neatly, making the hierarchy of taxa within metagenomic data self-evident. MetaSee also provided a solution for the visualization of multiple sample metagenomic dataset and makes the comparison among metagenome in a visualized manner possible.
Here we take the visualization of four saliva metagenomic samples
The Strainer
As show-cases of the front-ends of MetaSee, we developed two applications: Digital Mouth and Metagenome global survey.
There have been several studies on different oral microbial communities, and a series of reports have been published showing their relationship with oral disease, like gingivitis, periodontitis, caries,
In this work, we focus on the construction of “Digital Mouth” from two perspectives: firstly, the 3D structure of the mouth itself; and secondly, the microbial community’s structure in oral samples (
Following the idea that nature could be understood by reducing its complexity to the molecular level and analyzing the interactions between a small number of molecules to explain simple causal relationships; systems biology has been the modern approach to understand what life is. With the development of sampling technology, some expeditions that carried out a comprehensive worldwide sample collection campaign with a coherent strategy to record all the information necessary to the study of the emergent properties of plankton ecosystems emerged and a new concept “Oceans Systems Biology”
In this work, we tried to visualize these data with MetaSee. We collected 380 metagenomic samples with Geographic Information System(GIS) information, and analyzed them with Parallel-META
The visualization of the metagenomic samples has been proven to be very important to augment our ability to increase the efficiency of manual analyses. However, metagenomic data is very complex, and generally the focus of researchers are the similarities as well as differences among multiple samples, so the visualization of metagenomic data is a difficult work. MetaSee partially solves this problem based on an interactive and dynamic visualization toolbox.
The MetaSee toolbox that is proposed in this paper is an easy to use, interactive, cross platform data visualization toolbox. It addresses the problem of comparing multiple metagenomic samples. Moreover, the open-source portal for plug-in development has enabled the modification and embedding it in other applications possible.
When the WGS (whole genome sequencing)-based sequencing coverage is deep enough,
Additionally, MetaSee is not only for metagenomics. It is a flexible framework that can also take other dataset of tree structure as input and give beautiful visualization results. Examples of these data would include global health statistics data from WHO (
MetaSee was released under The MIT License
The standalone version and online service of MetaSee could be found at:
URL of Digital Mouth:
URL of Metagenome global survey:
(DOCX)
(DOCX)
We thank Shuncheng Lu (Email: