The authors have declared that no competing interests exist.
Conceived and designed the experiments: GZ EP AP FB ARZ. Performed the experiments: EP AP AR AA GC FR EE. Analyzed the data: GZ AL EG MC. Contributed reagents/materials/analysis tools: MG FB ARZ. Wrote the paper: GZ EP AP AL MG FB ARZ MC.
The aim of this study was to reconstruct the evolutionary dynamics of the A(H1N1)pdm09 influenza virus in Italy during two epidemic seasons (2009/2010 and 2010/2011) in the light of the forces driving the evolution of the virus. Nearly six thousands respiratory specimens were collected from patients with influenza-like illness within the framework of the Italian Influenza Surveillance Network, and the A(H1N1)pdm09 hemagglutinin (HA) gene was amplified and directly sequenced from 227 of these. Phylodynamic and phylogeographical analyses were made using a Bayesian Markov Chain Monte Carlo method, and codon-specific positive selection acting on the HA coding sequence was evaluated. The global and local phylogenetic analyses showed that all of the Italian sequences sampled in the post-pandemic (2010/2011) season grouped into at least four highly significant Italian clades, whereas those of the pandemic season (2009/2010) were interspersed with isolates from other countries at the tree root. The time of the most recent common ancestor of the strains circulating in the pandemic season in Italy was estimated to be between the spring and summer of 2009, whereas the Italian clades of the post-pandemic season originated in the spring of 2010 and showed radiation in the summer/autumn of the same year; this was confirmed by a Bayesian skyline plot showing the biphasic growth of the effective number of infections. The local phylogeography analysis showed that the first season of infection originated in Northern Italian localities with high density populations, whereas the second involved less densely populated localities, in line with a gravity-like model of geographical dispersion. Two HA sites, codons 97 and 222, were under positive selection. In conclusion, the A(H1N1)pdm09 virus was introduced into Italy in the spring of 2009 by means of multiple importations. This was followed by repeated founder effects in the post-pandemic period that originated specific Italian clades.
In March 2009, a novel swine-derived A(H1N1) influenza virus – A(H1N1)pdm09 – emerged in Mexico and started spreading across the globe, prompting the World Health Organisation (WHO) to raise the level of influenza pandemic alert to phase 6 (WHO – available at:
There was considerable heterogeneity in the pattern of A(H1N1)pdm09 spread in Europe. The UK experienced a substantial first wave of transmission in the early summer, followed by a second in the autumn, whereas most European countries (including Italy) experienced only limited transmission before the summer and a single wave in the autumn of 2009
A(H1N1)pdm09 is a novel reassortant virus containing genes from the North American triple reassortant swine viruses and neuraminidase (NA) and matrix (M) genes derived from Eurasian swine viruses. It had probably been circulating undetected among swine during the previous decade, but only recently emerged among humans
The emergence and subsequent rapid global spread of this influenza virus provided a unique opportunity to observe the evolutionary population dynamics of the first influenza pandemic virus after forty years, particularly in regions where virological surveillance is comprehensive, closely matched to the well-defined chronology of epidemic waves, and related disease surveillance (available at the Italian Influnet website:
The aim of this study was to reconstruct the evolutionary dynamics of the A(H1N1)pdm09 influenza virus in Italy during two epidemic seasons (2009/2010 and 2010/2011) in the light of the forces driving viral evolution.
According to the Regional Surveillance and Preparedness Plan (DGR IX/1046, 22 Dec. 2010 and DGR 5988, 30 Jun 2011), diagnostic and clinical management of patients admitted at hospitals in the Lombardy Region with severe and moderate ILI included prospective influenza A detection, subtyping and sequencing. These activities were centralized at the two regional reference laboratories (S.S. Virologia Molecolare, Fondazione IRCCS Policlinico San Matteo, Pavia, and Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milan). Mild respiratory infections were collected by sentinel practitioners and anonymously analyzed at the reference laboratory in Milan, in the frame of the National Surveillance Plan (Influnet). Data were analyzed anonymously according to a Regional Surveillance and Preparedness Plan. Mild ILI were collected and analyzed within the National Surveillance Plan (Influnet), following approval by the Ethic Commitee of Fondazione IRCCS Policlinico San Matteo, Pavia.
Within the framework of the Italian Influenza Surveillance Network, nasal swabs (NS) or broncho-alveolar lavages (BAL) were collected from outpatients with the symptoms of influenza-like illness (ILI) and hospitalised patients suffering from severe respiratory syndromes.
During the pandemic (May 2009–April 2010) and post-pandemic period (May 2010–April 2011), 5,844 respiratory specimens were collected in Lombardy (Northern Italy) and sent to the regional reference laboratories (S.S. Virologia Molecolare, Fondazione IRCCS Policlinico San Matteo, Pavia, and Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milan) for the virological diagnosis of A(H1N1)pdm09 infection.
A dataset was constructed that included 227 HA gene sequences (835 nucleotides in length, positions 121–954) obtained from as many A(H1N1)pdm09-positive patients, whose characteristics are shown in
Total RNA was extracted from the respiratory samples using the Nuclisens® easyMAG™ automated extraction kit (BioMerieux, Lyon, France), and a virological diagnosis of A(H1N1)pdm09 infection was made by means of a real-time reverse-transcriptase polymerase chain reaction (RT-PCR) assay
The sequences were deposited with GenBank, National Center for Biotechnology Information (NCBI) (
The sequences were aligned using CLUSTALW (integrated within the Bio-Edit sequence editor by Tom Hall, 2001;
The phylogenetic tree, model parameters, evolutionary rates and population growth were co-estimated using a Bayesian Markov Chain Monte Carlo (MCMC) method implemented in the BEAST v.1.54 package
Two independent MCMC chains were run for 100 million generations with sampling every 10,000th generation, and were combined using the LogCombiner 1.54 included in the BEAST package. Convergence was assessed on the basis of the effective sampling size (ESS) after a 10% burn-in using Tracer software version 1.5 (
The basic reproductive number (R0), indicating the mean number of secondary cases generated by a single primary case, was estimated on the isolates sampled during the pandemic period. It was calculated on the basis of the exponential growth rate (r) using the equation R0 = rD+1, where D is the average duration of infectiousness
The spatial reconstruction was obtained by means of the same Bayesian framework using a continuous time Markov Chain (CTMC) implemented in BEAST
The dN/dS ratio (ω) was estimated using the maximum likelihood (ML) approach under a global single-ratio model implemented in the HyPhy program
Site-specific positive and negative selections were estimated using three different algorithms: single likelihood ancestor counting (SLAC), derived from the Suzuki-Gojobori approach
Finally, in order to investigate whether the sampled sequences have been subjected to selective pressure at population level (i.e. along internal branches), an internal fixed effects likelihood (IFEL) method
In order to select the sites under selective pressure, we assumed a p value of ≤0.1 or a posterior probability of ≥0.9. The likelihood ratio test (LRT) was used to compare the performances of the M0 (one-ratio), M1 (nearly-neutral) and M2 (selection) models. Hyphy software was used for all of the analyses, some of which were made using the web-based Datamonkey interface (
The maximum likelihood and Bayesian analyses of the global data set of 561 A(H1N1)pdm09 isolates (227 from Italy and 334 from all over the world) showed that the Italian isolates clustered into five significant groups (pp>0.9). The clusters included a total of 136 isolates, representing 59.9% of the Italian isolates and 100% of those of the post-pandemic season (2010/2011), whereas the isolates of the pandemic season (2009/2010) were interspersed with sequences from other countries (
The 227 Italian isolates newly characterised in this study are highlighted in red. The specific Italian clades (A–D) including more than two isolates and having a posterior probability of ≥0.9 are shown, as are two main clades including several Italian and non-Italian strains (E and F). The letters indicate the position of the identified clusters, and the scale bar the number of substitutions.
In order to reconstruct the population dynamics of the A(H1N1)pdm09 epidemics in Italy, we separately analysed the Italian isolates with a known month of isolation, and estimated the evolutionary rate.
A strict and a relaxed (log-normal) molecular clock model were implemented under the less stringent Bayesian skyline plot demographic model. The marginal likelihood comparison showed that the relaxed clock did not fit the data significantly better than the strict clock (2lnBF = 9.85). Moreover, the lower 95% HPD limit of the coefficient of variation and the evolutionary rate standard deviation estimates were always very small (respectively 1.01×10−5 and 1.3×10−4), thus indicating that the evolutionary rate varied only slightly over the branches of the tree. For these reasons, the strict clock model was selected for all of the subsequent analyses.
Under this condition, we estimated a mean evolutionary rate of 4.15×10−4 subs/site/month (95% HPD: 2.9–5.3×10−4). The evolutionary rates of the 1st + 2nd codon positions (μ1, mean relative substitution rate 0.75, 95% HPD 0.64–0.87) were significantly lower than that of the 3rd codon position (μ2, mean relative substitution rate 1.5, 95% HPD: 1.3–1.7).
The Bayesian time-scaled tree of the 227 A(H1N1)pdm09 HA gene sequences collected during the two A(H1N1)pdm09 epidemics showed that the isolates sampled in the pandemic season were mainly interspersed at the root of the tree, with only two significant clades respectively including 24 (clade E, pp = 0.93) and six isolates (clade F, pp = 0.97) sampled between August 2009 and January 2010. These two clades were also present in the global tree, but included multiple isolates from different countries in the world (
The main significant clades described in the text are highlighted in different colours. The numbers on the internal nodes indicate posterior probabilities. The bar at the bottom of the tree represents the calendar months between the tMRCA of the tree root and the most recent samples (March 2011).
The time of the most recent common ancestor (tMRCA) of the Italian strains (corresponding to the tree root) was estimated to be a mean 25.3 months before March 2011 (i.e. February 2009), with a credibility interval of between 29 and 22 months (October 2008 and May 2009). The tMRCAs and the most probable months in the calendar time scale of the main clades and sub-clades estimated using the strict molecular clock model are shown in
Node | Months |
LHPD |
UHPD |
Date | Lower | Upper |
Root | 25.3 | 22.3 | 29.15 | feb-09 | may-09 | oct-08 |
A | 10.2 | 7.05 | 13.92 | may-10 | aug-10 | jan-10 |
A′ | 6.32 | 4.3 | 8.5 | sep-10 | nov-10 | jul-10 |
A′′ | 8.1 | 4.4 | 11.5 | jul-10 | nov-10 | mar-10 |
B | 11.9 | 7.68 | 16.01 | mar-10 | jul-10 | nov-09 |
B′ | 7.1 | 4.3 | 10.4 | aug-10 | nov-10 | may-10 |
B′′ | 5.3 | 3.6 | 8.1 | oct-10 | nov-10 | jul-10 |
C | 12.3 | 8.04 | 16.21 | mar-10 | jul-10 | nov-09 |
C′ | 6.3 | 4.3 | 9.1 | sep-10 | nov-10 | jun-10 |
C′′ | 6.7 | 3.9 | 10 | aug-10 | nov-10 | may-10 |
D | 12.2 | 8.02 | 16.27 | mar-10 | jul-10 | nov-09 |
D′ | 4.6 | 2.4 | 6.9 | oct-10 | dec-10 | aug-10 |
D′′ | 4.9 | 3.5 | 6.7 | oct-10 | dec-10 | aug-10 |
E | 21.1 | 20.32 | 22.14 | jun-09 | jul-09 | may-09 |
F | 21.2 | 21 | 21.59 | jun-09 | jun-09 | may-09 |
Months before March 2011.
Lower 95% Highest Posterior Density.
Upper 95% Highest Posterior Density.
The mean genetic distance was 0.3% (substitutions per 100 sites) (±0.1%) within the group of isolates obtained in the pandemic season, and 1% (±0.2%) within the group of isolates obtained during the post-pandemic season. The mean distance between the isolates of the two seasons was 0.9% (±0.2%).
A total of 16 codons showed mutations affecting more than 30% of the isolates included in at least one clade.
Codon | %A | %B | %C | %D | %E | %F | %No clade | %Total |
32 | 0 | 0 | 0 | 0 | 0 | 100 | 1,8 | 3,1 |
94 | 0 | 0 | 34,8 | 0 | 0 | 0 | 0 | 3,5 |
97 | 98,6 | 81,6 | 0 | 0 | 0 | 0 | 44,3 | |
125 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 10,1 |
134 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 5,3 |
138 | 94,4 | 0 | 0 | 0 | 0 | 0 | 0 | 29,4 |
141 | 0 | 13,2 | 0 | 83,3 | 0 | 0 | 0 | 6,6 |
172 | 0 | 0 | 34,8 | 0 | 0 | 0 | 0 | 3,5 |
183 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 5,3 |
185 | 0 | 100 | 0 | 0 | 0 | 0 | 3,6 | 17,5 |
205 | 100 | 2,6 | 0 | 0 | 0 | 0 | 0 | 31,6 |
216 | 100 | 5,3 | 0 | 0 | 0 | 0 | 0 | 32 |
222 | 2,8 | 10,5 | 4,3 | 16,7 | 100 | 0 | 0 | 14,5 |
249 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 31,1 |
295 | 0 | 0 | 0 | 83,3 | 12,5 | 0 | 0 | 5,7 |
297 | 0 | 0 | 0 | 0 | 62,5 | 0 | 0 | 6,6 |
The ML estimate of the dN/dS ratio (ω) gave a mean value of 0.43 (95% CI: 0.35–0.52), with no significant difference between different lineages (LRT from a global and local model 2Δlikelihood = 161.7, p>0.1). Using the Nielsen-Yang approach, we found that a model of evolution assuming site-specific selection fitted our data better than a neutral evolution model (M1-2Δlikelihood = 13.06, p = 0.001 by LRT), and analysis of site-by-site selection pressure revealed that most of the methods used supported two sites under positive selection at a 90% level of significance (
Positive selection site data | Clade | ||||
Methods | Location | Position | Normalised dN/dS | P value | Specificity |
SLAC |
RBS |
222 | 3,95 | 0,086 | |
FEL |
97 | 5,54 | 0,1 | ||
RBS and AS (Ca2) | 222 | 9,94 | 0,037 | ||
REL |
97 | 0,99 | |||
173 | 0,92 | ||||
RBS and AS (Ca2) | 222 | 1 | |||
IFEL |
97 | 17,9 | 0,017 | ||
AS (Sa) | 125 | 9,3 | 0,08 | B2 | |
AS(Ca2) | 141 | 8,8 | 0,07 | ||
AS (Sb) | 185 | 7,8 | 0,09 | B1 | |
249 | 4,9 | 0,1 | A1 | ||
297 | 5,1 | 0,1 |
RBS: receptor binding site;
AS: antigenic site;
SLAC: single likelihood ancestor counting;
FEL; fixed effects likelihood;
REL, random effects likelihood;
IFEL; internal fixed effects likelihood.
Six sites were selected along the internal branches (
Bayes factor comparison of four simple parametric (constant population size, exponential, expansion and logistic growth) and one piecewise demographic model (BSP) showed that the last fitted the data better than the others (
Analysis of the BSP (
Ordinate: the number of effective infections at time t (Ne(t)); abscissa: calendar months between the mean tMRCA estimate of the tree root and the most recent samples (March 2011). The thick solid line represents the median value, and the grey area the 95% HPD of the Ne(t) estimates. The vertical line indicates the 95% lower HPD tMRCA estimate of the tree root.
The local phylogeographical analysis was made by grouping the isolates on the basis of their sampling locations and building a spatial scaled phylogeny using the Bayesian framework. The location-annotated tree is shown in
Analysis showed that the first season of the epidemic was characterised by ancestors localised in Milan (the most probable location of the tree root) and in the area north of the city: the MRCA of clade E was most probably located in Milan (pp = 0.76), but the isolates included were from different places in Lombardy, whereas clade F was more restricted to Milan city and the hinterland. The post-pandemic season showed a more dispersed origin of the clades, which were localised in both northern and southern Lombardy: clades A and D most probably originated in the northern area of Milan (pp = 0.37), whereas clades B and C had MRCAs most probably located in southern Lombardy (pp = 0.37 and 0.34). The isolates included in all of the clades were dispersed throughout the region, including the southern part. The first wave of infection originated in localities with a high population density (≥382 inhabitants/km2), whereas the second wave involved less densely populated places (≤250 inhabitants/km2) (
The circle diameter is proportional to the number of isolates sampled in the locality. The low panel histogram represents the population density (number of inhabitants/km2).
We used a sophisticated Bayesian evolutionary framework for the molecular characterisation and reconstruction of the phylodynamics of the A(H1N1)09pdm influenza virus in northern Italy during two epidemics: the pandemic between summer and autumn 2009, and the post-pandemic period between November 2010 and March 2011.
In order to describe the Italian epidemics in the setting of the widespread diffusion of infection throughout the world, we analysed a total of 227 newly characterised northern Italian strains and a series of reference isolates from different countries retrieved from public databases. The first analysis included all of the patients' and reference isolates (global tree), and the second only the Italian strains, using a time-scaled phylogeny that assumed a strict molecular clock model.
Analysis of the trees suggested that the spread of the virus in Italy was the result of multiple independent introductions from different geographical areas because the Italian isolates sampled during the pandemic season were interspersed with sequences sampled in other countries at the root of the tree, and did not form any significant pure Italian clusters. The only exceptions were two clades in the Italian tree (E and F) that tended to group together in the global tree and included a number of isolates from other countries. Clade E was characterised by the presence of the signature substitution D222E, which has been previously described as circulating in the UK between July and September 2009
On the contrary, four or five highly significant pure Italian clades connected to the tree with long branches were observed during the post-pandemic season. Clade C was split into two different highly significant groups in the global tree, and clades A and D included a single non-Italian strain (one Turkish and one Tunisian).
These data suggest that multiple initial introductions of A(H1N1)09pdm in 2009 were followed by founder effects causing the local amplification of the infection in the post-pandemic season. In line with this hypothesis, the isolates of the post-pandemic wave showed greater intra-seasonal genetic divergence from those of the pandemic period, probably because of the typical effect of genetic drift on genetic variability, which tends to be less within groups but greater between groups.
In order to reconstruct the population dynamics of the H1N1 pandemic in Italy on a calendar time scale, we estimated the evolutionary rate of the Italian isolates using a better fitting strict molecular clock implemented in the Bayesian framework. Our estimates ranged from 3.5×10−3 to 6.4×10−3 substitutions/site/year, and were in line with those recently estimated for the HA gene by other authors
The tMRCA estimate of the tree root dating back to February 2009 is in line with the majority of the previous estimations, which place the origin of the pandemic H1N1 strain in January 2009, with intervals of credibility between late 2008 and March 2009
In line with these epidemiological data, the tMRCAs of the four specific Italian clades were dated spring 2010 (between March and May), and the radiation of the Italian strains (corresponding to the tMRCAs of the nodes inside the clades) were dated late summer and autumn 2010. Moreover, coalescent-based population dynamics revealed two phases in the exponential growth of the effective number of infections corresponding to the two seasons. The first exponential growth phase was between May and December 2009, and the second was between October 2010 and January 2011. The estimation of the basic reproductive number of the first pandemic period, gave a R0 close to the lower confidence limit estimated by Fraser, confirming the limited potential to spread of A(H1N1)pdm09, in comparison with previous pandemics
In order to reconstruct the geographic dispersion of the infections, the northern Italian isolates were grouped on the basis of their place of isolation, and an estimate was made of the genetic flows between the different geographical areas. The analysis showed that the isolates obtained during the first pandemic season most probably originated in areas with high population densities, such as Milan and its north-western hinterland where there are important international airports, whereas the isolates of the second season were more dispersed and most probably originated in smaller and less densely populated areas such as southern Lombardy. Given the characteristics of the urbanisation of this area, these localities may represent the most probable geographical areas, in which the founder effect occurred, a hypothesis that is supported by the phylogeographical tree. This suggests that the geographical dispersion of A(H1N1)pdm09 was characterised by possibly gravity-like dynamics in which larger cities act as attractors and drive the spread of infection to their smaller counterparts
The implementation of innovative advanced phylogenetic analysis methodologies to the study of emerging viruses at a molecular level will be of fundamental importance to improve concretely the epidemiological surveillance of emerging and re-emerging infections. Our present data shed light on the relationships between the evolutionary and phylogenetic characteristics of influenza A(H1N1)2009 virus and its geo-epidemiology, allowing to estimate essential parameters such as the transmission potential of the virus or the most probable path of geographical dispersion. Indeed, the opportunity of studying the emerging pathogens and analyzing their ecology, diffusion and evolution will allow generating an early response in case of outbreaks, particularly in a pandemic caused by viral pathogens able to spread quickly in the human population. The rapidity with which the history of the origin and dispersion of A(H1N1)pdm09 has been reconstructed on the basis of the available genomes
We also investigated the positive selection pressures acting on HA protein. Various algorithms revealed evidence of positive selection in the terminal and internal branches of six codons, most of which were fixed in the sequences of viruses belonging to different clades. The codons in the influenza HA gene identified as being positively selected are presumably encoding amino acid replacements that allow the virus to evade existing population immunity. In particular, two positions were observed with over 90% of significance. The D97N change was previously observed in influenza strains recovered from patients with fatal cases circulated in England
Particular attention was given to positions 187 and 222, which have recently been extensively studied because of their importance in receptor binding preference and cross-specific shifts
In conclusion, on the basis of all of these observations, we can hypothesise that the A(H1N1)pdm09 virus was introduced into Italy as a result of multiple importations by travellers coming from affected foreign areas. The initial transmission networks originated in the more densely populated locations in northern Italy between summer and autumn 2009, after which repeated founder effects occurred in more dispersed populations living in smaller cities and originated new specifically Italian clades that characterised the second season of the pandemic between November 2010 and March 2011. This suggests a possible gravity-like model of phylogeographical spread.
(TIF)
(DOC)
(DOC)
We would like to thank Giovanni Anselmi for his technical assistance.