Reader Comments

Post a new comment on this article

Re-analysis for multiple testing

Posted by Flandre on 23 Apr 2008 at 09:32 GMT

We read with interest this paper by Santos et al on "Conservation patterns of HIV-1 RT connection and RNase H domains: identification of new antiretroviral-associated mutations". However, as pointed out by the referee 2, we have serious concern regarding the statistical methodology used for such an extensive analysis.
Firstly, in patients harboring subtype B viruses, 263 codons have been analyzed without any correction for multiple testing. As pointed out by referee 2, this seems implying that 263 Fisher’s exact tests have been carried out to investigate difference at each position between treatment-naive and NRTI-treated patients. In fact, we understood that 433 tests have been carried out without any correction. Indeed, distinction of the amino acids has been taking into account for each position. Thus, a position involving 2 residues in addition to the consensus residue leads to carry out 2 fisher’s exact test, one for each residue. As indicated in Table 1, there are two tests for position 360: A360T and A360V. In addition, p-values for A360T and K530R seem to be wrong since 9.2% (8/87) versus 17.6% (33/187) leads to p=0.0716 (2 sided Fisher’s exact test) rather than p=0.028, and 1.1% (1/93) versus 12.1% (7/58) leads to p=0.0054 rather than p=0.010, respectively. All our analyses were done with SAS (version 9.1).
Concerning Figures 2 and 3, we suspect that 97 positions (50 for RT codons 298-440 and 47 for RT codons 441-560) were completely invariant (no residue shown below each consensus residue), 152 positions had some variability but the corresponding 299 tests were non-significant (residues shown below each consensus residue but not boxed in gray) and for 14 positions, corresponding to 37 tests, a significant test was found for at least one residue (boxes in gray). So a total of 433 tests have been carried out also for the 97 positions completely invariant a Fisher’s exact test is not computable due to the lack of variant residue.
For such study, the authors should have used an analysis correction for multiple testing that may be based on the False Discovery Rate (FDR) approach suggested by Benjamini and Hochberg (1995). Considering that m tests are computed and that the unadjusted p-values are ordered as p1 < p2 < p3 < … < pm. The adjusted p-values, noted p*, are given by
p*m=pm
p*m-1=min(pm , [m/(m-1)] pm-1 )
p*m-2=min(pm-1 , [m/(m-2)] pm-2 )
....
The invariant position leads to a p-value of 1.00. From proportions given in Figure 2 and 3, it is easy to recover the data for the 14 positions listed in Table 1. One can them obtained the exact p-value for the 37 corresponding tests, although as indicated above the p-value for the A360T is 0.0716 instead of 0.028. The 37 p-values found were 0.008, 1.00, 0.0499, 0.0039, 0.0716, 0.002, 0.3187, 0.0058, 0.3187, 0.5537, 4.72 10-6, 0.3187, 0.0429, 0.216, 1.00, 0.5858, 1.867 10-4, 0.512, 0.2771, 0.3444, 0.1196, 0.2734, 0.0075, 1.00, 0.5467, 1.00, 0.0204, 0.3841, 0.3243, 0.0237, 3.164 10-4, 1.00, 0.3841, 0.0054, 0.3841, 0.0204 and 0.3841.
It will be extremely tedious to recover the data fore the 152 positions to compute the exact p-values of the 299 corresponding tests. However, we can simply simulated 299 p-values considering that all these p-values lie within 0.05 and 1.00. We then generated 299 p-values from a uniform distribution in the interval (0.05,1). So the complete list of p-values corresponds to the 97 p=1.00, the 37 p-values given above and 299 p-values uniformly generated between 0.05 and 1.0. Using the procedure multitest from SAS (version 9.1) we can obtained the 433 corrected p-values based on the FDR’s approach. The results are provided below look a little bit different than those presented in the paper by Santos et al.

Rank Test Raw p-value corrected p-value
1 Test_11 4.72 10-6 0.00204
2 Test_17 0.0019 0.04042
3 Test_31 0.00032 0.04567
4 Test_6 0.002 0.2165
5 Test_4 0.0039 0.33774
6 Test_34 0.0054 0.35877
7 Test_8 0.0058 0.35877
8 Test_23 0.0075 0.40594
9 Test_27 0.0204 0.8552
… … … …











Only three corrected p-values are <0.05 most of the other p-values were not only >0.05 but >0.20. Test_11, test_17 and test_31 correspond to comparison for position A371V, A400T and K527N, respectively. We repeat 500 times the generation of the 299 p-values to investigate the impact of such a generation. This has no impact for the first 8 lowest corrected p-values showing the robustness of our results.
Therefore a most appropriate analysis of the data indicates that only three mutations are associated with treatment experience at the classical 5% level (A371V, A400T and K527N). Interestingly, only 59 patients harbouring a B subtype (27 naïve and 32 NRTI-treated patients) were Brazilian and then independent to both the Los Alamos database and the Stanford HIV Drug Resistance Database. This is a very important point to consider whether this work provides independent results to other studies in this field. For example, the A371V was found in 2 (2.3%) out of 87 naïve patients and in 42 (22.6%) out of the 186 NRTI-treated patients (Table 1 of Santos et al). However, only 27/87 (31%) are ‘new’ naïve patients (Brazilian naïve patients) and only 32/186 (17.2%) are ‘new’ NRTI-treated patients (Brazilian NRTI-treated patients). For the A400T these proportions were 36.5% and 18.5% for naïve and NRTI-treated patients, respectively, and for the K527N these proportions were 29% and 55%, respectively. Other work in this area might also used the Los Alamos and Stanford HIV Drug Resistance Database in addition to local sequences to increase the power of the study but increasing the likelihood to reproduce similar results.

Minor comment
It is not clear in the Results section whether the number of subtype B sequences in the RNase H domain is 278 (66+153+27+32) or 288 as indicated in the top of page 3. It is not clear why the denominator in the Table 1 for positions in the RNase H domain is 58 for NRTI-treated patients while this should be 185 as indicated in the top-left box for position 442 in Figure 3. There are many errors concerning the consistency of the numbers used in this paper that do not help to fully understand what has been exactly done in this work.

Philippe Flandre
Bénédicte Roquebert
Vincent Calvez
Anne-Geneviève Marcelin