Conceived and designed the experiments: MAR SMG DH SS. Performed the experiments: MAR SS AKC PW MY. Analyzed the data: MAR ZJT SS AKC MRG PW SS. Contributed reagents/materials/analysis tools: MRG SMG DH. Wrote the paper: MAR ZJT SS MRG PW SMG MY SS.
The authors have declared that no competing interests exist.
The rapidly expanding availability of de novo sequencing technologies can greatly facilitate efforts to monitor the relatively high mutation rates of influenza A viruses and the detection of quasispecies. Both the mutation rates and the lineages of influenza A viruses are likely to play an important role in the natural history of these viruses and the emergence of phenotypically and antigenically distinct strains.
We evaluated quasispecies and mixed infections by de novo sequencing the whole genomes of 10 virus isolates, including eight avian influenza viruses grown in embryonated chicken eggs (six waterfowl isolates - five H3N2 and one H4N6; an H7N3 turkey isolate; and a bald eagle isolate with H1N1/H2N1 mixed infection), and two tissue cultured H3N2 swine influenza viruses. Two waterfowl cloacal swabs were included in the analysis. Full-length sequences of all segments were obtained with 20 to 787-X coverage for the ten viruses and one cloacal swab. The second cloacal swab yielded 15 influenza reads of ∼230 bases, sufficient for bioinformatic inference of mixed infections or quasispecies. Genomic subpopulations or quasispecies of viruses were identified in four egg grown avian influenza isolates and one cell cultured swine virus. A bald eagle isolate and the second cloacal swab showed evidence of mixed infections with two (H1 and H2) and three (H1, H3, and H4) HA subtypes, respectively. Multiple sequence differences were identified between cloacal swab and the virus recovered using embryonated chicken eggs.
We describe a new approach to comprehensively identify mixed infections and quasispecies in low passage influenza A isolates and cloacal swabs and add to the understanding of the ecology of influenza A virus populations.
Influenza A virus is an enveloped RNA virus belonging to the family
Influenza A viruses are zoonotic and as a group of viruses, they possess a wide host range including humans, at least 105 bird species, pigs, horses, dogs, cats, ferrets, mink and marine mammals. In the United States alone, more than 200,000 hospitalizations and 36,000 deaths annually are due to complications from seasonal influenza in humans. Globally, it is estimated that influenza causes 300,000 to 500,000 human deaths annually
From April 2009 a pandemic caused by a novel H1N1 virus has been ongoing. As of August 2009, there have been more than 182,000 laboratory confirmed cases of pandemic influenza H1N1, 1799 deaths, in 177 countries and territories have been reported to WHO (
Influenza viruses have a high error rate during the transcription of their genomes because of the low fidelity of RNA polymerase
The frequency of infection with multiple subtypes of the virus in wild birds or swine populations that may contribute significantly to the emergence of new viruses with altered host specificities is not known. Complete genome sequencing of influenza A viruses by the current method (RT-PCR followed by classical dye terminator chemistries) is time and resource demanding. For example, in the recent large-scale influenza sequencing project, 95 overlapping one-step RT-PCR were performed per sample to obtain the complete viral genome sequence
Twelve samples including 10 virus isolates (eight avian influenza viruses and two swine influenza viruses propagated in embryonated chicken eggs and in MDCK cells with trypsin, respectively) and two cloacal swabs, were processed for pyrosequencing. Complete genomes (>99% Open Reading Frame) were obtained for all eight segments of each virus isolate and a cloacal swab (
Sequence coverage varied between 2X to 1300X depending on the region of segment 6. The average redundancy of 468.1X was achieved for this segment.
Sample ID | Number of PTP Region(s) | Total Influenza Reads | Avg. read Length | Segment Length Covered -Bases (Depth coverage-X) | Total Bases Covered |
Comments on Sequence Coverage | |||||||
PB2 | PB1 | PA | HA | NP | NA | M | NS | ||||||
(A/turkey/Minnesota/1138/1980(H7N3) | 2 (A primer) | 19122 | 236 | 2335 (378.9) | 2337 (459.0) | 2227 (374.4) | 1726 (411.4) | 1530 (138.6) | 1443 (287.2) | 999 (273.3) | 883 (301.1) | 13480 | NP lacks 19 nucleotides and M lacks 2 nucleotides of coding sequences at the 3′ end |
A/mallard/South Dakota/Sg-00125/2007(H3N2) | 2 (1 for A and 1 for B primer) | 16300 | 217 | 2335 (206.7) | 2320 (138.1) | 2226 (312.5) | 1753 (289.6) | 1557 (350.5) | 1461 (506.1) | 1017 (108.0) | 880 (236.0) | 13549 | |
A/northern pintail/South Dakota/Sg-00126/2007(H3N2) | 2 (A primer) | 21350 | 243 | 2334 (246.3) | 2320 (114.7) | 2226 (362.5) | 1703 (466.2) | 1557 (593.8) | 1411 (786.9) | 996 (557.6) | 880 (268.6) | 13427 | HA lacks 14 nucleotides and NA lacks 8 nucleotides of coding sequences at the 3′ end |
A/mallard/South Dakota/Sg-00127/2007(H3N2) | 3 (2 for A and 1 for B primer) | 9977 | 239 | 2335 (171.6) | 2320 (119.0) | 2226 (203.0) | 1753 (182.4) | 1557 (205.8) | 1450 (311.7) | 1017 (104.1) | 865 (126.1) | 13523 | |
A/mallard/South Dakota/Sg-00128/2007(H3N2) | 3 (1 each for A, B, A+B primer) | 13607 | 235 | 2335 (237.7) | 2303 (91.7) | 2212 (290.2) | 1753 (163.7) | 1557 (396.7) | 1461 (468.1) | 1017 (128.7) | 883 (132.4) | 13521 | |
A/green-winged teal/Minnesota/Sg-00131/2007(H3N2) | 2 (A primer) | 22127 | 242 | 2305 (416.9) | 2321 (445.4) | 2229 (387.0) | 1761 (457.3) | 1564 (197.8) | 1447 (621.1) | 1007 (262.9) | 880 (472.1) | 13514 | |
cloacal swab of A/green-winged teal/Minnesota/Sg-00131/2007(H3N2) | 1 (A primer) | 5341 | 191 | 2299 (55.9) | 2335 (41.9) | 2224 (20.1) | 1728 (67.8) | 1519 (102.6) | 1460 (200.3) | 1000 (78.6) | 885 (133.4) | 13450 | PB2 lacks 3 nucleotides and NP lacks 12 nucleotides of coding sequences at the 3′ end |
A/mallard/Minnesota/Sg-00133/2007(H4N6) | 2 (A primer) | 11974 | 245 | 2322 (280.5) | 2317 (188.6) | 2203 (169.6) | 1726 (466.9) | 1522 (154.5) | 1451 (229.4) | 985 (119.9) | 871 (115.8) | 13397 | NP lacks 11 nucleotides and M lacks 15 nucleotides of coding sequences at the 3′ end |
cloacal swab of A/mallard/Minnesota/Sg-00133/2007(H4N6) | 2 (A primer) | 15 | - | - | - | - | - | - | - | - | - | - | Only 15 influenza reads were obtained [PB2 (2 reads), PB1 (4 reads), PA (3 reads), HA (3 reads; H1, H3, and H4), NP (1 read), NS (2 reads)] |
A/bald eagle/Virginia/Sg-00154/2008(mixed) (H1N1 and H2N1 mixed isolate) | 2 (1 for A and 1 for B primer) | 18153 | 220 | 2282 (174.6) | 2330 (100.5) | 2229 (130.3) | 1768 (391.9) | 1545 (46.0) | 1388 (64.8) | 1014 (111.7) | 875 (86.8) | 13431 (H1N1 lineage) | PB2 lacks 20 nucleotides, NA lacks 30 nucleotides and M lacks 1 nucleotide of coding sequences at the 3′ end |
2343 (242.6) | 2309 (114.6) | 2222 (108.0) | 1765 (288.5) | 1539 (137.5) | 1376 (81.9) | 991 (50.1) | 883 (61.8) | 13428 (H2N1 lineage) | NA: lacks 41 nucleotides of coding sequences at the 5′ end | ||||
A/swine/Minnesota/SG-00239/2007(H1N2) | 2 (A primer) | 31329 | 246 | 2335 (568.6) | 2335 (364.9) | 2228 (477.7) | 1769 (843.3) | 1567 (528.7) | 1460 (1326.9) | 1025 (389.7) | 830 (110.3) | 13549 | NS lacks 25 nucleotides of coding sequences at the 3′ end |
A/swine/North Carolina/R08-001877-D08-013371/2008(H3N2) | 4 (A primer) | 21738 | 238 | 2357 (294.1) | 2342 (210.3) | 2238 (87.6) | 1759 (366.3) | 1564 (535.3) | 1463 (1029.2) | 1069 (356.2) | 894 (794.6) | 13686 |
Total genome size of influenza A is 13523–13645 bp (PB2 – 2341 bp; pb1 - 2341 bp, PA - 2233 bp; HA - 1728–1779 bp; NP - 1565 bp; NA - 1398–1469 bp; M - 1027 bp; NS - 890 bp).
Complete genome sequences were obtained from the cloacal swab of A/green-winged teal/Minnesota/Sg-00131/2007(H3N2) and virus recovered using egg system. A comparison of sequences from each segment revealed 80–91% nucleotide identities (PB2, 90%; PB1, 87%; PA, 90%; HA, 80%; NP, 83%; NA, 82%; M, 91%; and NS, 86%) suggesting extensive variability or existence of quasispecies in the cloacal swab. In the second cloacal swab-virus isolate pair, complete sequences were obtained from the virus isolate, whereas only 15 influenza sequence reads were obtained from the cloacal swab. These 15 reads of ∼230 bp included sequences of PB2 (two reads), PB1 (four reads), PA (three reads), HA (three reads; one read each for H1, H3, and H4), NP (one read), and NS (two reads). Four sequences (one each for PB1, PA, H4, and NP) had 100% identity with the virus recovered using embryonated chicken eggs, A/mallard/Minnesota/Sg-00133/2007(H4N6).
Whole genome sequences using standard dye terminator chemistry were also available for four H3N2 viruses. Comparison of these sequences against the pyrosequencing data revealed eight single nucleotide mismatches (
Virus | Base substitution in the indicated gene |
|
NP | M | |
A/mallard/South Dakota/Sg-00125/2007(H3N2) | a149g (N50S) t441g (silent) g642a (silent) g1017a (silent) a1321c (silent) | g715a (A239T) |
A/northern pintail/South Dakota/Sg-00126/2007(H3N2) | t441g (silent) | |
A/mallard/South Dakota/Sg-00127/2007(H3N2) | t1191c (silent) | |
A/mallard/South Dakota/Sg-00128/2007(H3N2) |
No differences were identified in polymerase genes, HA, NA, or NS segments.
A series of genomic subpopulations or quasispecies as identified by single nucleotide polymorphisms (SNP) at specific nucleotide positions was identified in five virus isolates (four egg grown avian influenza viruses and one cell cultured swine influenza virus) and a cloacal swab. All the above samples are H3N2 subtype and all quasispecies populations observed in this study originated from mutations in NP, PB1, PA, M, and NS genes (
(A) Sequence polymorphisms in the matrix (M) gene at codon 715 of isolate: A/mallard/South Dakota/Sg-00125/2007(H3N2) is shown. The consensus sequence shows
Sample ID | Quasispecies and position |
Remarks | ||||
PB1 | PA | NP | M | NS | ||
A/mallard/South Dakota/Sg-00125/2007(H3N2) | 1725-R | 423-K | 149-R 441-K 642-R 1017-R 1321-M | 715-R | 809-R | |
A/northern pintail/South Dakota/Sg-00126/2007(H3N2) | 1725-R | 419-Y 423-K | 149-R 441-K 642-R 1017-R 1321-M | 715-R | 809-R | |
A/mallard/South Dakota/Sg-00127/2007(H3N2) | 1725-R | 419-Y | 149-R 441-K 642-R 1017-R 1191-Y | |||
A/mallard/South Dakota/Sg-00128/2007(H3N2) | 1725-R | 1191-Y | ||||
cloacal swab of A/green-winged teal/Minnesota/Sg-00131/2007(H3N2) | 174-Y | 1021-Y 1026-R 1029-M 1125-Y 1140-R | ||||
cloacal swab of A/mallard/Minnesota/Sg-00133/2007(H4N6) | Co-infection with three HA subtypes - H1, H3 and H4 | |||||
A/bald eagle/Virginia/Sg-00154/2008(mixed) | Mixed isolate full length sequence of two clade - H1N1 and H2N2 - were obtained | |||||
A/swine/North Carolina/R08-001877-D08-013371/2008 (H3N2) | 174-Y | 201-R |
Nucleotide numbering begins at each ORF; R = A/G Y = C/T M = A/C K = T/G.
A bald eagle isolate, A/bald eagle/Virginia/Sg-00154/2008(H1N1/H2N1) that was originally typed by sequencing segments of HA and NA as H2N1 showed evidence of mixed subtypes by whole genome analysis. Analysis of the HA sequences from the 454 data revealed that this isolate carried both H1 (
Evolutionary associations were inferred in MEGA 4.0 using the maximum parsimony algorithm with Kimura-2P correction and 1000 bootstrap replications (confidence of the branches are shown on branch bifurcations).
As described above, from 15 influenza reads (∼230 bases each) that were realized for cloacal swab of A/mallard/Minnesota/Sg-00133/2007(H4N6), there was evidence of mixed infection with H1, H3, and H4 subtypes.
Complete genome sequencing of influenza A viruses is essential to determine the genetic basis of pathogenicity, antiviral resistance, and understanding the evolution of viruses in a variety of hosts and environments. Previous studies on sequence-based detection of antiviral resistance and diagnostics routinely used amplification of short portion of NA or HA genes followed by pyrosequencing. Hoper et al.
Application of GS De novo Assembler or GS Reference Mapper software for our 454 sequence analysis failed to identify full-length contigs. GS assembler yielded several short contigs and GS Reference Mapper produced a few false insertion/deletions (
Presence of mixed infection and quasispecies in influenza viruses has also been demonstrated by others using RT-PCR of a short segment of HA from cloacal samples
This is in agreement with the study of Wang et al.
A possible rationale for the relatively few influenza reads (15 reads) observed in one of the cloacal samples could be due to insufficient RNA in the original sample or RNA losses during processing for pyrosequencing. In the other cloacal swab of A/green-winged teal/Minnesota/Sg-00131/2007(H3N2), complete sequences were obtained and these sequences had 80–91% nucleic acid identities with the virus recovered using embryonated egg system. This result indicates that there was a mixed population of viruses in this cloacal swab but the H3N2 subtype possibly became the predominant subtype by out-competing other virus subpopulations in the embryonated egg system. More studies with larger numbers of matched-pair samples need to be performed to completely resolve this phenomenon.
Complete genome sequences of A/bald eagle/Virginia/Sg-00154/2008(H1N1/H2N1) showed two virus lineages (H1N1 and H2N1). Using RT-PCR based HA and NA typing, this virus was identified as H2N1. In general, unambiguous indexing of mixed subtype infections would require sequential limiting dilution, PCR, cloning, and sequencing of several clones. To our knowledge, this is the first report of full genome sequencing of all eight segments from a mixed infection representing two lineages of the virus.
In our analysis of 12 samples, quasispecies were identified from five samples (four egg grown waterfowl isolates and one cell cultured swine influenza virus). All these viruses were H3N2 and identified quasispecies originated from mutations in NP, PB1, PA, M, and NS genes but not in HA, NA or PB2 genes. The four waterfowl isolates used in our study were recovered at the same study site and on the same day. This result concurs with the study of Dugan et al.,
Inasmuch as the mutation rate for type A influenza viruses is estimated at one nucleotide change per 10,000 nucleotide during replication and most infections are caused by as many as 10 to 1000 virions which likely possess varying numbers of nucleotide differences in their genomes, one can expect that each influenza A virion is possibly a quasispecies. However, we identified relatively few quasispecies - probably because the currently available sequence analysis software do not allow robust quasispecies analysis and extensive manual curation is necessary. We believe that with the help of improved bioinformatic tools we would detect more quasispecies populations in our sample sets.
The method described in the current study does not require virus propagation, sequence information and circumvents the need for cloning and library construction prior to sequencing. Thus the currently described method is simple and less time consuming compared to Sanger sequencing. Despite these obvious advantages the cost of equipment is high and requires extensive bioinformatic expertise for assembling and analysis of the contigs.
In conclusion, using an unambiguous genome sequencing approach, we present evidence of quasispecies and mixed infections among influenza A viruses that could help shape our understanding of the ecology and evolution of these viruses. Future studies should be undertaken to - 1) strengthen the interpretation of culture and sequence data generated by current influenza A virus surveillance networks; 2) establish novel influenza sequence-based evolutionary analyses; and 3) provide an improved understanding of influenza subtype stability and transmission in a wide array of mammals and birds.
Twelve samples, including eight avian influenza viruses grown in embryonated chicken eggs, two swine influenza viruses propagated in MDCK cells with trypsin, and two influenza A virus positive cloacal samples were used: 1) A/mallard/South Dakota/Sg-00125/2007(H3N2), 2) A/northern pintail/South Dakota/Sg-00126/2007(H3N2), 3) A/mallard/South Dakota/Sg-00127/2007(H3N2), 4) A/mallard/South Dakota/Sg-00128/2007(H3N2), 5) A/green-winged teal/Minnesota/Sg-00131/2007(H3N2), 6) A/mallard/Minnesota/Sg-00133/2007(H4N6), 7) A/bald eagle/Virginia/Sg-00154/2008(H1/H2N1) (mixed isolate), 8) A/swine/Minnesota/Sg-00239/2007(H1N2), 9) (A/turkey/Minnesota/1138/1980(H7N3), 10) A/swine/North Carolina/R08-001877-D08-013371/2008 (H3N2), 11) cloacal swab of A/green-winged teal/Minnesota/Sg-00131/2007 (H3N2), and 12) cloacal swab of A/mallard/Minnesota/Sg-00133/2007(H4N6). All avian isolates were grown in embryonated chicken eggs while swine viruses were grown in Madin Darby canine kidney (MDCK) cells with trypsin. All isolates were passaged once or twice only.
Total RNA was extracted from allantoic fluid/cell culture/cloacal swab using QIAamp Viral RNA Mini kit (Qiagen) as per the manufacturer's instructions. To reduce the contaminating host nucleic acids commonly observed in viral RNA preparations, viral RNA molecules were captured and enriched through the hybridization of a biotin-labeled oligonucleotide directed to the conserved 5′-end of all eight segments of influenza A virus genome. Total RNA (50-µL; ∼50-ng/µL) was incubated in the presence of 200-µL of 6X SSPE buffer containing 0.1 units/µL of SUPERase-In (Ambion) and 0.5 µM of the 5′-Capture Oligo (
The enriched viral RNA was fragmented into a size range compatible with sequencing on the Genome Sequencer FLX. Five micro liters of 5X RNA Fragmentation Buffer (200 mM Tris-acetate, pH 8.1, 500 mM Potassium acetate, 150 mM Magnesium acetate) was added to 20-µL of enriched viral RNA. The samples were mixed thoroughly by pipeting, incubated for 2 min at 82°C, and then immediately transferred to ice to stop the fragmentation reaction. The reaction volume was increased to 50-µL by adding RNase free water, purified with RNAClean (Agencourt) as per the manufacturer's instructions and eluted with 20-µL of RNase free water.
The fragmented RNA sample was reverse transcribed in 20-µL final volume using random hexamer (
For clonal amplification and sequencing on the Genome Sequencer FLX, the sscDNA required the addition of adaptors to each terminus. The adaptors have been designed to enforce directional ligation to the sscDNA, such that one will be uniquely ligated to the 5′-end (sscDNA Adaptor A) and the other to the 3′-end (sscDNA Adaptor B) of the sscDNA. Each adaptor is comprised of two complimentary oligonucleotides that are annealed together as described. The 3′-end adaptor consists of “sscDNA Oligo B” (
The final adapted sscDNA was amplified using Advantage 2 PCR Kit (Clontech) in a total volume of 50-µL containing 5-µL of 10X Advantage 2 buffer, 2-µL of 50X dNTP mix (10 mM each), 10-µL (10 µM) Primer A (
Data analyses were performed on the Linux servers or Windows work station at the Minnesota Supercomputing Institute. All the sequencing reads were blasted against influenza genome in NCBI blast version.2.2.16. The ‘non influenza’ sequences were filtered out and only influenza reads were assembled in GS De nova Assembler Version 2.0.00.20 and mapped in GS Reference Mapper Version 2.0.00.20. The influenza contigs obtained using the above software were reassembled in Sequencher Version 4.8 (Genecodes).
All the influenza reads were run in GS De novo Assembler with three sets of parameters: minimum overlap (MOL) of 40 nucleotides and 90% identity, MOL of 100 and 100% identity, and MOL of 200 and 100% identity. The larger contigs (>500 bases) obtained by the above method were BLAST analyzed using NCBI resources and the most closely related sequences, referred to as reference sequences, for each segment were downloaded. All the influenza reads were mapped with reference sequences in GS Reference Mapper. The contigs obtained from GS Assembler and the consensus sequences obtained from GS Mapper were reassembled in Sequencher 4.8. The new contigs were then examined for ambiguous bases (e.g. R, Y, K etc.) and particular base positions were manually examined for the presence of more than one kind of base (quasispecies) in GS Reference Mapper.
All the eight segments of four AIV isolates - A/mallard/South Dakota/Sg-00125/2007(H3N2), A/northern pintail/South Dakota/Sg-00126/2007(H3N2), A/mallard/South Dakota/Sg-00127/2007(H3N2), A/mallard/South Dakota/Sg-00128/2007(H3N2) were sequenced by classical Sanger sequencing method using ABI PRISM 3730xl DNA Analyzer (ABI) and the results were compared with the consensus sequences of pyrosequencing obtained with GS Reference Mapper software.
We would like to thank Microbial and Plant Genomics Institute, Biomedical Genomics Center and Computational Genetics Laboratory at the University of Minnesota for providing resources and services to perform these studies.