Introduction
Mycobacteria form a group of over one hundred species, ranging from harmless saprophytic organisms to major human pathogens. The well known pathogenic species, such as Mycobacterium tuberculosis, Mycobacterium leprae and Mycobacterium ulcerans, belong to the subgroup of slowly growing mycobacteria (SGM). By contrast, rapidly growing mycobacteria (RGM) —almost 60 species of which have been identified—usually live in the soil or water and only rarely cause human infections [1]. Mycobacterium abscessus is one of the few RGM able to infect humans and is undoubtedly the most frequently isolated and the most difficult to combat [2].
M. abscessus was first described by Moore and Frerichs in 1953 [3]. These authors reported the isolation of a previously unknown mycobacterium from a human knee infection with subcutaneous abscess-like lesions (type strain M. abscessus ATCC 19977T), hence the name “abscessus”. With the recognition of Mycobacterium chelonei (now M. chelonae) in 1972, these two RGM organisms were classified as two subspecies of the same species. Over two decades, they were collectively designated “M. chelonae”, or even grouped with the RGM Mycobacterium fortuitum under the designation “M. fortuitum complex” [4]. It was only in 1992 that M. abscessus was separated from M. chelonae [5], and this separation soon resulted in the recognition that M. abscessus has a particular pathogenicity in humans [6]. Very recently, M. abscessus itself (now M. abscessus sensu lato) was shown to consist of three species: M. abscessus sensu stricto, M. massiliense and M. bolletii [7], [8]. These species are very closely related and cause a similar spectrum of human infections [9], [10]. Thus, hereafter, unless otherwise stated, they will be collectively referred to as “M. abscessus”.
Following its recognition as a distinct entity, and the development of molecular methods of identification for mycobacteria, M. abscessus has emerged as an important human pathogen over the last 10 years, causing many more cases of infection than M. chelonae and M. fortuitum—historically the most important pathogenic RGM [2], [11]. M. abscessus is responsible for more than 80% of all pulmonary infections due to RGM in the United States and is associated with a much higher fatality rate than any other RGM [6]. M. abscessus lung infection usually, but not exclusively, develops in subjects with underlying lung disorders (e.g. bronchiectasis, cystic fibrosis [CF]) [6]. The infection of CF patients is becoming a major issue: M. abscessus is being recovered with increasing frequency from CF patients, including young children. It causes a serious, life-threatening lung disease and is responsible for disseminated, often fatal infections following lung transplantation [12]–[18]. M. abscessus is also a leading cause of sporadic and epidemic cases of skin and soft-tissue RGM infections, following the use of contaminated syringes or needles, and after plastic or cardiac surgery [19], [20]. M. abscessus is not only pathogenic, it is also one of the most antibiotic-resistant RGM species [1]. It is resistant to most disinfectants and biocides and thrives in the most hostile environments—a feature associated with its propensity to cause outbreaks of healthcare-associated disease [1].
The pathogenicity of M. abscessus has been investigated in recent studies in various cell and mouse models. M. abscessus is an intracellular bacterium able to grow in macrophages and free-living amebas [8], [21]. M. abscessus infection in mice is associated with granulomatous lesions spontaneously evolving toward caseous lesions [22]. Interferon gamma (IFN-γ) and tumor necrosis factor (TNF) are the key cytokines of the murine host response, and are absolutely required to control infection [22]. Studies have also identified major differences in pathogenic profile between the two forms in which M. abscessus is isolated from humans: the S (smooth) form and the R (rough) form [21]. The R form lacks a surface polyketide compound, glycopeptidolipid (GPL) [23], [24], and causes more severe infections in mice, strongly inducing TNF secretion by macrophages [23].
Over the last decade, genomic studies have shown how the ecological and pathogenic characteristics of certain SGM have changed through evolution. For example, M. leprae, the causal agent of leprosy, represents a model case of adaptation through massive genome reduction [25]. Gene deletion and decay have resulted in the elimination from M. leprae of many of the major metabolic activities present in the closely related species, M. tuberculosis, the tubercle bacillus. This process of gene deletion is associated with the divergent evolution of M. leprae towards an obligate intracellular lifestyle. Other mycobacteria have acquired plasmid-borne virulence factors. The presence of a giant plasmid involved in the synthesis of a potent macrolide toxin forms the basis, for example, of the unique pathogenic properties of M. ulcerans, the causal agent of Buruli ulcer [26]. Genomic studies have also revealed how the deletion of large chromosomal regions led to the attenuation of Mycobacterium bovis bacillus Calmette-Guérin, the only vaccine against tuberculosis currently available [27], [28].
Very few genomic studies have been performed in the RGM group, and none has dealt with a “pathogenic” RGM. The first RGM to be sequenced—M. smegmatis—is a model mycobacterium widely used in research laboratories as a surrogate host for the expression of heterologous mycobacterial genes. The other RGM organisms sequenced (e.g., M. vanbaalenii) have been studied because they are able to degrade polycyclic aromatic hydrocarbons and are therefore of potential interest for use in environmental bioremediation [29]. We report here the complete genome sequence of M. abscessus (sensu stricto) and the insights it has provided into the genetic basis of its the pathogenicity of this bacterium, which is highly unusual among RGM. Whole-genome analysis not only revealed the presence of many “mycobacterial” virulence genes, but also showed that M. abscessus had a large series of specific genes in common with two pathogens most frequently isolated from CF patients—Pseudomonas aeruginosa and Burkholderia cepacia. These genes were presumably acquired from distantly related environmental bacteria via horizontal gene transfer (HGT).
Discussion
Deciphering the ecology and biology of M. abscessus
The genetic information contained in the genome of M. abscessus tells us a great deal about the lifestyle of this microorganism in natural conditions. The presence of a large number of genes and operons involved in resistance to arsenic or encoding cysteine desulferases is clearly a hallmark of an environmental organism living in soil or aquatic environments. However, M. abscessus also contains a whole series of genes known to be involved in intracellular survival (e.g., mgtC, msrA, plc), and is well-equipped to obtain energy from the degradation of eukaryotic host-derived lipids (numerous lipase-encoding genes), as observed for mycobacteria adapted to an intracellular lifestyle [58]. The low level of metabolic versatility (e.g., far fewer ABC transporters or two-component sensor histidine kinases than M. smegmatis) suggests that this bacterium tends to specialize in intracellular parasitism.
The most plausible hypothesis is that M. abscessus has evolved to escape predators, such as free-living amebas [8] sharing the same ecosystem. Soil-dwelling amebas are known to be most abundant at plant-soil interfaces, because these interfaces support the growth of various plant parasites, including bacteria, on which amebas feed [59]. Consistent with this hypothesis, the genome of M. abscessus encodes a particularly large number of salicylate hydroxylases, enabling this bacterium to resist the salicylic acid-mediated defense mechanisms of plants [60]. This suggests that M. abscsessus lives in close contact with plants and therefore has to deal with amebas. This hypothesis may explain an extraordinary paradox in the epidemiology of M. abscessus: despite all the evidence to suggest that M. abscessus lives in soil and water—our own genomic data and the large number of epidemics linked to the direct or indirect use of non sterile water—this bacterium is detected much less frequently in such environments than other closely related RGM, such as M. chelonae [61].
We analyzed an S phenotype strain. A major challenge for the future will be to determine the role of S↔R switches in the natural lifecycle of M. abscessus (controlling whether this bacterium grows in the form of a biofilm) and its interaction with its hosts, including humans (modulation of the host response). GPL may be required for biofilm establishment or for escape from amebas in aquatic environments [62], but seems to hinder the development of infection, probably by acting as a target of the specific immune response of the host [22]. We recently reported the in vivo isolation of an R variant from the type strain CIP 104536T [22]. Transcriptomic studies are currently underway to determine the mechanisms responsible for the loss of GPL production in this R variant and the associated events potentially accounting for its “hypervirulence” in mice. The data obtained should make it possible to identify the external signals involving in triggering the switching process.
Evolutionary mechanisms
This study highlights the major role of horizontal gene transfers in the evolution of RGM. It is hardly surprising that this evolutionary mechanism, which has also been described in SGM [63], [64], is particularly important in RGM, and that it involves a reservoir of genes from different bacteria with a high G+C content widely present in soil or water, such as Streptomyces sp., Rhodococcus sp. and pseudomonads. RGM come into contact with many other bacteria in the environment—often as part of a biofilm [65] —and they may exchange genetic material with these other bacteria [66]. Mycobacteriophages—or other bacteriophages with a wide host spectrum—may play a key role in such transfers, as they display extensive mosaicism, combining viral and bacterial genes in a vast gene pool [32]. Such a role in gene transfer is consistent with the presence of a full-length prophage sequence containing non mycobacterial genes in the M. abscessus genome. However, the presence of this prophage sequence does not exclude a role for other genetic vectors, such as plasmids, which are frequently found free or integrated into the genome within RGM [67].
The demonstration that pathogenicity genes of non mycobacterial origin are present in M. abscessus raises questions about the timing of their acquisition. The fact that the closely related species M. chelonae is also pathogenic in humans—an exceptional feature among RGM—strongly suggests that many of these genes were acquired before the separation of these two species. We are currently carrying out a comparative genomics study of M. abscessus and M. chelonae (http://www.genoscope.cns.fr/spip/Mycobacterium-chelonae-and.html), which should make it possible to confirm or to infirm this hypothesis. We have also recently made use of the genome sequence of M. abscessus to develop a multilocus sequence typing (MLST) approach. Our preliminary analyses on more than a hundred M. abscessus (sensu lato) strains suggest that there are three highly homogeneous groups, corresponding to the three previously described species (M. abscessus sensu stricto, M. massiliense, M. bolletii), with less than 1% divergence within groups and around 2% divergence between groups. The species of M. abscessus sensu lato therefore seem to have emerged relatively recently. However, it should be stressed that most of the strains of M. abscessus available from collections were isolated recently and mostly in a clinical context. Indeed, as stated above, M. abscessus is only very rarely isolated from the environment. There may therefore be a bias in the results, because we cannot rule out the possibility that strains capable of infecting humans constitute an unusual subpopulation.
One of the findings of this study was entirely unexpected: the presence of a mercury resistance plasmid almost identical to the pMM23 from the M. marinum strain recently sequenced by the team of Stinear (strain ATCC BAA-535) [30]. The pMM23 plasmid discovered in this strain, isolated from a patient in 1992 (Moffett Hospital, San Francisco), is exceptional in M. marinum, as none of the more than 40 other isolates of this species studied by the team of Stinear has been found to carry this plasmid [30]. The presence of this plasmid in M. abscessus is, thus, particularly interesting, as it demonstrates that exchanges may occur between M. marinum, a SGM, and M. abscessus, a RGM, either directly or via another organism, probably a mycobacterium. It also suggests that M. abscessus and M. marinum may live in the same ecosystems and may be transmitted to humans by similar mechanisms. Future work should determine the prevalence of this plasmid in M. abscessus and should assess whether this plasmid constitutes a useful marker (e.g., for epidemicity).
We were also surprised by the very low frequency of IS in the genome of the strain of M. abscessus that we sequenced, much lower than usually found in mycobacteria. Confirmation of this result is required, with a representative panel of isolates. If confirmed, this characteristic would have a major impact on the plasticity of the genome of M. abscessus. As elegantly demonstrated in Escherichia coli, reducing the number of IS elements renders bacterial genomes more stable, with a greater capacity for acquiring foreign DNA [68].
Key factors shared with other major CF pathogens
This study provides new insight into the emergence of M. abscessus as a pathogen in CF patients. We were surprised to discover that the largest tranferred regions detected in M. abscessus contained genes involved in the metabolism of aromatic compounds. Such systems are characteristic of pseudomonads in general, and of two major CF pathogens, P. aeruginosa and B. cepacia, in particular [49]. This implies that M. abscessus is able to live in the same ecosystems as P. aeruginosa and B. cepacia, with patients becoming infected from the same microbial reservoir. Another, not necessarily exclusive possibility is that these metabolic characteristics provide a selective advantage in CF patients, due either to their illness or the treatments increasingly used over recent years, such as aerosolized drug administration [69]. According to this hypothesis, M. abscessus may benefit from factors promoting its extracellular development and its implantation in the bronchial tract, before going on to cause deeper infection of the pulmonary parenchyma and ganglions.
Conversely, several M. abscessus factors typical of intracellular parasites are also present in P. aeruginosa and B. cepacia, the most notable examples being phospholipase C and the MgtC protein [70], [71]. Both P. aeruginosa and B. cepacia produce two phospholipases C and two MgtC proteins [70], [71]. An MgtC-like protein is also found in Aspergillus fumigatus—the main pathogenic fungus in CF patients—but not in closely related nonpathogenic species such as Aspergillus nidulans. Pseudomonads and other related organisms infecting CF patients have previously been considered to be exclusively “extracellular” pathogens. Our data raise questions about the interaction of these organisms with macrophages or other monocyte-derived cells in CF patients. This is consistent with the finding that the production of MgtC is required for the survival of Burkholderia cenocepacia–the main B. cepacia complex pathogen infecting CF patients–within macrophages [72], [73].
A recent analysis of the genomes of various CF and non-CF P. aeruginosa isolates revealed mosaic structures, consisting of a conserved core component interrupted by strain-specific genomic islands acquired by HGT, which seem to provide CF isolates with specific metabolic pathways involved in infection [74]. The identification of multiple episodes of HGT in M. abscessus strongly suggests that a similar evolutionary trend occurs within RGM. Along the same lines as the studies carried out in P. aeruginosa by the team of Lowry [74], comparative genomic studies of CF and non-CF M. abscessus isolates could prove particularly fruitful for elucidating the tropism of certain organisms for the respiratory tract of CF patients, opening up promising new possibilities for the control of microbial infections in CF patients.
Materials and Methods
We sequenced M. abscessus (sensu stricto) CIP 104536T ( = ATCC 19977T), using a whole-genome shotgun strategy (EMBL accession numbers: CU458896, chromosome; CU458745, plasmid). This strain is of the S phenotype, and can switch in vivo to an R phenotype [22]. Mycobacteria were grown in Middlebrook 7H9 broth supplemented with Tween 80. M. abscessus DNA, prepared using standard methods, was manipulated in the presence of 50 µM thiourea (DNA in solution) or by replacing Tris buffer by HEPES at the same molarity (DNA in plugs), to prevent Tris-dependent DNA degradation [39]. We constructed three genomic libraries (inserts of 3–4, 8–10 and ∼20 kb, respectively) and generated ∼80,000 sequences (50,000, 20,000 and 10,000 sequences, respectively, giving 11-fold coverage). Putative protein-coding sequences were predicted by SHOW (http://migale.jouy.inra.fr/outils/select_mig_outils_zpt), tRNA genes by tRNAscan, and rRNA genes by RNAmmer [75], [76]. Sequences were analyzed with the BIOFACET package and the BLAST software suite [77], [78]. General features, such as G+C content (%), were assessed with ARTEMIS software [79]. The origin of replication was identified with ORILOC [80]. The circular representations of chromosome and plasmid were generated with DNAPlotter (http://www.sanger.ac.uk/Software/Artemis/circular). The M. abscessus full-length prophage was drawn with BugView (http://www.gla.ac.uk/~dpl1n/BugView/index.html). Whole genome dotplot comparison of M. abscessus versus M. smegmatis was drawn with Gepard (http://mips.gsf.de/services/analysis/gepard). CLUSTER-C was used to cluster genes into paralogous families [81]. Alien Hunter was used to screen the genome for regions with “atypical” sequence content [82]. Transfers of blocks of genes from non mycobacterial organisms were identified as follows. We first identified CDS more similar to proteins from non mycobacterial organisms than to mycobacterial proteins (no mycobacterial protein among the 50 best hits). We then used GeneTeam, with a delta value of 3 and visual inspection to search for areas of synteny with relevant non mycobacterial organisms [83]. Only clusters with at least 3 syntenic genes not found in other sequenced mycobacteria were retained. Phylogenetic analyses were carried out with the “Phylogeny.fr” web server (http://www.phylogeny.fr), using Muscle for multiple alignment and GBlocks for alignment curation, and constructing the phylogenetic trees with PhyML [84], [85]. Branch supports were calculated with the approximate likelihood ratio test [86]. Distributions of M. abscessus and M. smegmatis proteins, according to the Kegg classification, were compared using chi-squared tests with continuity correction. To account for multiple testing, p-values were corrected according to Hochberg's method. Differences were considered as statistically significant if corrected p-values were <0.05.