Reader Comments

Post a new comment on this article

Referee comments: Referee 2

Posted by PLOS_ONE_Group on 18 Mar 2008 at 10:54 GMT

Referee 2's review:

This paper by Santos et al on "Conservation patterns of HIV-1 RT connection and RNase H domains: identification of new antiretroviral-associated mutations" is very interesting, in that it clarifies some mutational patterns related to the poorly-understood regions of the RT connection and RNase H domain, which seem to have an impact on NRTI resistance. However, the paper could be improved substantially by improving the statistical analysis, and by finding a way to deal with unequal distribution of subtypes among treatments naïve compared to treated individuals.

Major Remarks

1. The use of an one-tailed Fisher's exact test is not sufficiently well-founded.

One-tailed test assumes that all differences in the unexpected direction - large and small- be treated as simply non significant. However, the authors mention mutations with frequencies lower in the treated dataset than in the naïve dataset.
Secondly, one-tailed tests make it easier to reject the null hypothesis when the alternative is true. So it is much easier to find a statistically significant result.

Additionally, correction for multiple testing is required, especially when the authors consider 263 codons being evaluated.

2. The heterogeneous distribution of subtypes in the dataset indeed represents a major problem in this study. For identifying potential resistance-associated mutations the authors limit the dataset to subtype B, so no remark here.

But for the identification of conserved versus variable regions of connection and RNase H domains the authors use the full dataset. They state at page 8 (top middle) that in some cases, sites seemed more variable in the naïve isolates than in the NRTI-treated isolates and explain this by the different subtype proportion among those two groups. This is partially true since also the effect of treatment can have an affect that at a position more variability is allowed in a "naïve state" versus "treated state".

For the connection domain dataset, 28% of naïve sequences is of subtype B versus 80% of treated sequences is of subtype B. ( And the same for the RNase H dataset.)
So I am not convinced that an analysis as this is sufficiently well-founded, and the results can be interpreted in a correct way. In figure 2 , the authors use subtype B as reference amino acids, although subtype B makes up only +/- 20% of the naïve dataset. For example, position RT519, the wildtype for subtype B is amino acid S versus amino acid N for almost every other subtype.
So the authors should repeat their analysis by categorizing per subtype or through other means that can take into account the different subtype distribution.

Minor Remarks

1. In the introduction section (page 3), the authors state in the middle that "To date, however, the influence of connection and RNase H RT domains in NRTI or NNRTI resistance remains unknown". This is a too strong statement since there is already some data out ( see mentioned references).
The term "poorly understood" is more appropriate.

2. From the article it is not clear to me what the NNRTI experience is.

For example

• In the methods section (page 4), "plasma samples from HIV-positive patients, both naïve and HAART-experienced ...
• In the methods section (page 6, top), "... which treatment information was available, and which included the use of NRTI were included in the analysis."
• In the results section (page 9, bottom),"... since the vast majority of treated patients whose viral samples were analyzed have been subjected to multiple, complex HAART drug combinations prior to sample collection."

No where is explicitly stated that there is no NNRTI experience. This is essential for the conclusions since NNRTI have the same target (RT) and it has been reported that the resistance pathways of NNRTI and NRTI are not completely independent. Thus some of the found mutations could be related to NNRTI resistance development. So the authors should either make sure that the treated population has not NNRTI experience or mention what the NNRTI experience is and consequently toning down their conclusions.

3. The distribution of the dataset is somewhat unclear with respect to patients and sequences. Do the authors consider 1 isolate per patient? And do sequences of the thumb and connection domain dataset (total of 510) have patients in common with the RNase H dataset? Or does this concern two entirely different patient populations?

4. Result section (page 7, bottom).
The numbers mentioned do not match with each other, when concerning the RNase H domain sequences. A total of 118 sequences was successfully amplified and sequenced. ( 1 sequence per patient?). 59 belonged to subtype C and 59 belonged to subtype B.

A total of 55 patients were naïve (B=27 and C=31). But 27 + 31 equals to 58.
A total of 63 were on treatment (B=32 and C=28). But 32 + 28 equals to 60.

This needs clarification.

5. The overview of number of retrieved sequences and their respective subtypes from hivdb and los alamos given in the methods section (page 6, middle) fit better in the result section (page 8), as is done for the sequences collected in Brasil.

6. Numbers on page 9 of the results section also do not match.
The authors report 25 amino acid residues that had significantly different frequencies in naïve versus treated isolates. It is not clear what is meant with "residues"? Does it concern positions or mutations.

Of those 25, 20 were positively-selected in treated sequences. I count 20 positions but 21 different mutations (A360T/V).
Of those 25, 6 showed decreased proportions in treated versus naïve.

The only way I can come to the sum of 25 is that the authors mean 25 positions in RT are significantly associated. (20 positively and 6 negatively, but position RT491 is represented twice., so in the end -> 25) If the authors refer to the actual mutations, the total number must be changed to 27. It would be more appropriate to refer to mutations and not positions since the RT491 position has apparently a double role.

7. A table is missing with the 25 positions/mutations for which a significant difference has been found, and displaying their frequencies and p-values. Such a table was only made for AZT, it is very easy to add an extra column with p-values and frequencies of these mutations for the entire dataset.

Related to this table, the authors mention that for the AZT substudy, comparisons were only possible up to RT codon 500, as scarce sequence information is available after that codon.
I see no argumentation why we could not assume this was true for all drugs, and therefore all analyses done are not valid beyond position 500. Displaying the number of sequences analyses in the "added column" (see paragraph above) should help solve this question somehow.

8. In that same section (page 9, middle), the authors sum 5 mutations which are totally absent in naïve isolates ( A360V,I393M,L486F,S489A and I506L), of which only 2 are statistically significant. I do not understand the mutation Q547K is mentioned in the next sentence, and not just added to the list of 5 (since Q547K is also only seen in treated isolates and not naïve).

Mutation D488E, which is also significantly associated, is not mentioned in this list of mutations which are absent in naïve isolates. However, on page 14 (top), such a statement is done for D488E. So finally, the list should be expanded to 7?

9. Again some obscurity about numbers on page 10 (middle), again concerning the number 25. RNase H domain compromises 14 mutations and connection 11. But maybe the answer of point 6 makes this remark redundant. So please clear the problem with 25 positions or mutations. (also on page 13 there is a reference to "25")

A second remark about this paragraph, one can not speak of "positively-selected mutations" and "mutations selected by treatment" when also mutations with a decreased frequency in treated versus naïve are referred to. Better refer to treatment-related, like is done in other paragraphs.

10. At the end of the results section (page 10), the authors speak about the seven amino acids residues that comprise the RNase H catalytic site. Which amino acids make up this site are explained in the discussion section (page 12). Maybe it is better to also mention the amino acids in the results section.

11. In the middle of page 11, the authors write that regions closer to the hybrid are more conserved than regions on the outside surface of the protein. Is this valid for both naïve isolates as for treated, or does this refer only to one group?

12. Suggestion: Try to associate the 8 mutations, significantly associated with AZT monotherapy, with TAM mutations ( or at least TAM1 versus TAM2).

Cosmetic Remarks

13.Page 12 (middle) contains a list of 5 mutations, in which the order is wrong. Put H539 before N545.

**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication the manuscript has been revised in light of these comments and to address other editorial requirements.