Reader Comments

Post a new comment on this article

“Identification” of Nosema species

Posted by psommer on 04 May 2011 at 19:08 GMT

The Authors of this paper inform a reader about the identification of Nosema species. However, the approach used allows only for peptide identification. In the next step, based on amino acid sequences of identified peptides, a source protein may be identified if, and only if, the protein sequence coverage is 100%. In other cases one can only infer the protein identity with a probability computed at the protein level, e.g. by using a popular ProteinProphet algorithm. Unfortunately, at this time, there is no scientific justification for the identification of source organisms based solely on proteomics analysis of complex environmental samples. Consequently, a researcher may only use proteotypical peptides as biomarkers confirming the presence or absence of specific proteins and their potential sources (strains, species, genera). However, the use of any biomarker should be validated before applying it to answer specific questions. Unfortunately, my evaluation of proteomic results presented in the Appendix B of the US Army Technical Report [ref. 20] indicate that all the peptides used as biomarkers of Nosema species are lacking the required specificity and their identities are uncertain.

To assign sequences to acquired product ion mass spectra the Authors constructed a database (Appendix A) by selecting proteins derived from: (a) nine RNA viruses (their total proteome, consists of 22,539 amino acids), (b) one DNA virus, i.e. invertebrate iridescent virus IIV-6 (63,745 amino acids), and (c) 12 Nosema species (appox. 41,807 amino acids). Next, they searched all spectra against this mini-database with SEQUEST. This way, only sequences from above preselected species were allowed to be assigned to any acquired mass spectrum. Sets of top spectrum-sequence matches from 89 experiments were sorted and are presented in Appendix B of the report.
These results show that matches are dominated by peptides from IIV-6 and Nosema and, on average, the Authors “identified” more peptides from the DNA virus. For example, by taking results from tests # 6, #11, #86, #98, and #100 I’ve found that spectra were assigned to IIV-6 and Nosema at the ratio of 1.64 ± 0.37. It is quite important because the mini-database constructed by the Authors is also dominated by IIV-6 and Nosema proteins (82.4%) at the ratio of 1.52 (IIV-6/Nosema, 63,745 aa/41,807 aa). Therefore, the observed assignments could be random and simply reflect protein composition of the database.

My analysis of these assignments (Appendix B) found 41 peptides matching “Nosema” species and satisfying quite “relaxed” criteria for accepting spectrum-sequence assignments as correct (Xcorr, 1.5, 2.2, 3.3 for +1 +2 and +3 ions, respectively; delta Cn> 0.1). [For further explanations, see my responses to the comment Accessing 'Technical Report']. Among them, 28 peptides were found once, while 4 other only twice. Therefore their occurrence seems random and I will ignore them. The most frequently observed matches (expressed as the number of occurrences in all 89 reported tests) for the remaining 9 peptides were as follows: 39; 27; 22; 11; 11; 7; 5; 3; and 3.
By performing a BLAST search against the NCBI nr-database with the most frequently, i.e., 39 times occurring peptide (SYELPDGQVIKIGSER) the origin of this sequence could be confirmed as a segment from Nosema locustae actin protein. However, it is highly probable that this assignment is incorrect because a similar peptide (SYELPDGQVITIGNER) could originate from human actin, a ubiquitous protein forming cytoskeleton in every human cell. First of all, nominal masses of these two peptides are the same and could not be resolved by a low resolution mass spectrometer used in this work [monoisotopic mass of T + N from human actin (215.091 u) is practically the same as mass of K + S from Nosema actin (215.127 u)]. Second, the suggested Nosema variant contains an internal trypsin specific cleavage site that somehow survived the overnight exposure to this enzyme without releasing the SYELPDGQVIK peptide. Third, we know that these bee samples were handled by humans! Of course, the inclusion of a human database during searches or the inspection of a mass spectrum for the presence of diagnostic fragment ions (b11 , b14 ,y3, and y6) could resolve this issue. However, this information is missing.

The purported peptide HKGVMVGMGQK could also originate from Nosema actin filaments and was “identified” as such during 22 tests. Nevertheless, I am doubtful that this identification is correct: (1) A honey bee protein (Apis mellifera, XP_003251465.1) variant of this peptide (HQGVMVGMGQK) has the same nominal mass as the one from Nosema, due to the substitution of the internal Lysine residue (K, 128.095 u) by Glutamine (Q, 128.059 u). (2) Monoisotopic mass difference between these two amino acids (0.036 u) is not distinguishable by a low resolution instrument. (3) The survival of a peptide with a missed cleavage site is not very probable. And finally, HQGVMVGMGQK should be expected in the analyzed sample because it originates from the organism they analyzed, while there is lack of a tryptic peptide (GVMVGMGQK) that would be the expected from Nosema.

The following “Nosema” peptides {sequence (frequency)}: VXDIIK (27); LAVNMVPFPR (11); IWHHTFYNELR (11); FPGQLNADLR (5); RIDIAGR (3) match also barley (Hordeum vulgarum) and many other species, while IKKELSTR (3) may originate from common fishes (e.g. Danio). In addition, VXDIIK has no diagnoistic value (can originate from countless number of species - from bacteria to humans), while the assignments for RIDIAGR and IKKELSTR are very questionable (note missed internal cleavages). Finally, IIAQVVSSITASLR (7) – is also of low diagnostic value because it can originate from many other species, e.g., from genera Oikopleura, Encephalitozoon, Edhazardia, Fasciola, Glugea, Trachipleistophora, and Endoreticulatus, in addition to Nosema.

In conclusion, flaws in experimental design and data analysis indicate that the Authors provided insufficient evidence for identification of IIV-6 (see L. Foster, Mol Cell Proteomics 2011 March; 10(3): M110.006387 ) and Nosema species in analyzed honey bee samples.

Paul Sommer, PhD
Brooklyn, NY

No competing interests declared.