Reader Comments

Post a new comment on this article

Referee comments: Referee 1 (Tom Nye)

Posted by PLOS_ONE_Group on 27 Mar 2008 at 18:00 GMT

Referee 1's review (Tom Nye):

Review of the original submission:
This paper describes the detection of periodic expression for genes that would normally be considered 'silent' i.e. which are only expressed at levels comparable to the microarray background.

The method depends critically on the assumption of a particular period of expression for each data set; by imposing this assumption the phase can be estimated for gene expression profiles below the detection threshold.
Although this assumption is questionable, and might be challenged by some readers, it is clearly stated early in the paper and it seems valid to me to put forward an argument in this manner.

The paper is well-written and relatively easy to follow, although there are a number of minor points that need addressing, as listed below.
My main concern is that the question of multiple testing is not raised -- and it must surely affect the results.

Labelling every gene with a p-value below 0.1 under some test as periodic is mistaken and will lead to an excessive false positive rate.
A Bonferoni correction, or better, a false discovery rate analysis (FDR) could be used to address this.

In a similar vein, the identification of genes that are periodically expressed below the background threshold relies criticially on the assumption of circadian periodicity.
If a different (random) period were assumed, how would the results change?
In other words, by imposing the circadian period the method described is able to identify the phase of gene expression for 'silent' genes, but is this also the case if we pick some period arbitrarily and repeat the same analysis?
I think this is probably a less important question than the overall problem of multiple testing.

Minor Points
------------

1. From the abstract: "We report the periodic pattern detected in genes not called 'Absent' by traditional analysis." Shouldn't this be "not called 'Present'"?

2. What is "pilot light expression"? The term isn't explained anywhere in the paper.

3. Bottom of page 2: "This effect is not specific to a particular tissue and observed in every analysed data set". It's not entirely clear which data sets you mean here. Better would be: "in the mouse and yeast data sets we consider in this paper."

4. "White adipose tissue is know to respond to the insulin signal." Is there a reference for this?

5. Page 3: typo, "Why do the absent genes, which expression pattern" -> "with expression pattern".

6. Page 3: reference to supplementary figure 2. There are two figures labelled as supplementary figure 1. This needs correcting.

7. I don't understand the sentance "but in all cases reading the signal... forms a detector", page 3.

8. While I can follow the individual steps described in the methodology, I remain slightly unsure as to how they fit together to produce the results stated previously. In particular, how is the hypothesis test described on page 4 actually carried out -- the methods are not described in the same notation / context.

9. Page 6: "tabulates" -> "tabulated"?

10. Page 6: "particular order of the observation is the Y" -> "observations Y"?

11. Page 7: "superior to the assigning a phase" -> "superior to assigning a phase"

12. Figure 1: it would be helpful to mark on the time axis.

13. Supplemental figure 2: in the caption "tree" -> "three". Also the axes should be labelled on the graph.

14. Further work. Would it be possible to perform image analysis on a "between spot" location on a microarray? That way the analysis could be repeated for this false spot that comes from the background. If it too was found to have periodic expression, it would suggest that the "pilot light" expression was really an artefact from the experimental set-up.

Review of the first revised manuscript:
The revised manuscript and reply goes a large way to addressing the issues I raised in my original review.

All the minor points have been addressed, although the supplementary figures require a small amount of attention.

Supplementary figures 1 and 3 do not have labelled axes, while the caption of supplementary figure 2 refers to supplementary figure 2 as a separate figure -- I'm not sure what's gone wrong here.

The main issue I raised was multiple testing.

I found your reply to this very interesting (and indeed entertaining -- I enjoyed the analogies!) and I now have a clearer idea of why multiple testing has not been compensated for.

Essentially you assert that the hypotheses being tested are not independent so that correction for multiple testing is inappropriate.

I continue to think that this is a difficult issue, and I'd like to see a more formal argument eventually (though not in the current manuscript).

While I think your argument is sufficient to justify not using FDR, I continue to feel that the paper would benefit from at least some mention of multiple testing without going into great detail: I suggest adding a sentence saying that a correction for multiple testing has not been used due to the obvious lack of dependence between hypotheses.
I do not think it's necessary to go into the same depth in the paper as you kindly provided in your response to my query.

**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication the manuscript has been revised in light of these comments and to address other editorial requirements.

RE: Referee comments: Referee 1 (Tom Nye)

ptitsyn replied to PLOS_ONE_Group on 28 Mar 2008 at 16:55 GMT

Thanks for posting this.
I can complement this with responce to concern about multiple testing adjustment in analysis of periodicity in gene expression. It's a serious issue to be concern with, indeed.


Multiple testing is a serious question which was not brought up for a reason. While Bonferroni correction is obviously inapplicable, I have implemented Benjamini-Hochberg FDR correction and we even applied this procedure in our early publications about circadian rhythms. Then, after careful consideration we stopped using FDR. This decision made it harder to publish, but this way we are sure we are not doing wrong just because everybody keeps asking. I understand that FDR has been made a standard procedure, but it would do no good if performed blindly, just because everybody does. So, let me apply some reasoning and explain why I didn’t bring up the issue of multiple testing in this particular paper.
First of all, the argument about multiple testing actually relates to the question whether we can assume circadian rhythm present in the genes expressed below the Affy presence call. These “silent” genes make about 30% of all probesets. The assumption is based on the previous publications, demonstrating oscillation in 99% of all probesets without any regard to presence/absence calls. If that number is to be corrected down by FDR, it will also subtract from the number of genes/probesets for which rhythm can be assumed. This would not affect neither the main message of the paper I want to communicate, only adjust the numbers used to illustrate the concept.
There are a few levels of reasoning to why we excluded FDR in the previous publications. For the standard approach (testing profiles one by one) estimation of false-positive is relatively easy. In one of the computational simulations published on our first PLoS paper we wipe out all oscillation by permuting timepoints in every profile, but still there is a small fraction of genes that pass the Pt-test. This number is approximately equal to the number of false-positives estimated by Benjamini-Hochberg method. The phase continuum approach presented in the most recent PLoS paper makes it much more complicated since each gene is tested (dependently!) a few times in combinations with the nearest neighbors (in the same phase, same amplitude, concatenated in the order of descending autocorrelation). However, even in the simplest case it’s not all that simple.
First of all, at the very root is the intuitive, but unfounded assumption made while formulating the null-hypothesis in traditional tests. It implicit the default “steady line” gene expression pattern – like somebody would naturally assume the Sun orbiting the Earth from seeing it move over the sky while everything around stays still. In fact, there is no reason to prefer this assumption over the opposite: all genes oscillate, which would make the traditional null and alternative hypotheses swap places. Both visible color pattern and computational experiment with permuted timepoints indicate that this is exactly what should be done (along with inversing the FDR correction).
Labelling every gene with a p-value below 0.1 under some test as periodic is mistaken and will lead to an excessive false positive rate.
Let’s consider another celestial example: Colorado has about 300 sunny days in a good year. Thinking along the lines of the FDR as it routinely applied, this number should be adjusted using Bonferroni, or better, Benjamini-Hochberg correction. For a similar data acquired in Scotland the number of sunny days may not even exceed the expected false-positives and should be reduced to 0. This approach might be OK for planning your next year vacation, but could be misleading in astronomy, if you try to prove the existence and uniqueness of the Sun shining on all parts of the Earth. In this logic Scotland couldn’t be possibly located on the same planet with Colorado. Than, say, Fairbanks, Alaska, has no visible Sun for a good part of the year and no visible stars for another part. The question is: can we reasonably assume the existence of a dominating 24h rhythm, required for stochastic resonance approach and detect circadian variation in temperature, humidity and lighting without Sun crossing the horizon even once? Apparently not, since after FDR we have left in doubt the very existence of Sun and without FDR labeling any day as sunny would be mistaken and leading to an excessive false positive rate. Is it correct to apply FDR correction to the number of observations of daily 24h rhythm (rather than bad weather forecast, obscuring this rhythm)? The story of rhythmic gene expression is the same, just far less obvious since in our understanding of biology we probably made as far as we did in astronomy while gazing out from a cave and munching on a mammoth steak. However, after analysis of over a dozen of circadian and metabolic oscillation data sets it becomes increasingly clear that what we have at hands are multiple observations of the same phenomenon, which are not a subject for FDR adjustment of multiple independent tests.
A propos independent testing. This is actually a necessary assumption. We also need to assume that change in expression in response to experimental conditions affect a fraction of the genes, while absolute majority is unchanged. But can we actually assume independence for these tests? I know that according to Storey et al. FDR analysis can be tolerant to mild dependencies. It’s a separate story of how did they get these mild dependencies out of heavily interconnected transcriptome and whether these dependencies adequately reflect real biology of the experiments. There is no need to discuss it in connection with this manuscript. But they surely didn’t mean detection of a single frequency dominating timeline expression of all genes. Just one look on the heatmap is enough to state that all expression profiles are heavily dependent – and correlation analysis only confirms the obvious. Thus, application of either Bonferroni of FDR correction has no ground.
Last, not least, let’s look at the very reason we are conducting these studies, the biology. Genes are expressed, regulated and function not one by one, but within the context of biological pathways, in coordination with scores of other genes. Out of all known pathways in public and private databases we could not point a single pathway which would not have at least a few key elements oscillating beyond a reasonable doubt. We have demonstrated and validated circadian oscillation in the key regulators of oxidative phosphorylation, glycolysis/gluconeogenesis, lipid metabolism, all signal transduction cascades, nuclear receptors and even the most basic components of transcription. I think any biologist would agree that if TATA-binding protein oscillates quite a few other genes would be compelled to follow the rhythm. Biology leaves simply no space for something non-oscillating. Rather opposite, in a recent publication we have reported the molecular mechanism compensating for constant oscillation, creating steady transcript abundance (so that a signal transduction a pathway is ready or suppressed at any time of the day). This mechanism is based on alternative transcription variants with different turnover rate expressed in a counter-phase to each other. After some critical mass of genes proven oscillating labeling some of the rest non-oscillating (or false-positives) makes no sense, it’s a useless operation based on a false assumption and aggravated by tradition, poorly understood but rarely questioned.
Consider yet another example: an American scientist on vacation finds out that his Remington electric shaver doesn’t work in a hotel room in Europe. True to the proven methodology he knocks the next door to check whether Remington makes a pleasant buzz or stinky smoke there. Out of a hundred rooms in the hotel half turns out locked and every second time a person opening the door doesn’t understand a polite request in American English. However, the researcher gathers enough data to conclude that out of 100 tests 25 gave smoke. Here comes the question: if our statistic-savvy researcher applies multiple testing adjustment - Bonferroni, or better Benjamini-Hochberg FDR correction to control the number of false-positive tests would it bring the estimated number of rooms with 220V power supply in a European hotel closer or further from reality and how would it affect his chances to shave in a familiar way?
In my opinion there is nothing wrong with the theory of the FDR analysis, it’s true and correct. However, its application in biology should not be reduced to a routine detached from common sense reasoning. I may not have convinced you in this short improvised debate, but the point I want to make is that it was a thoughtful decision not to bring up the issue of multiple testing in this manuscript. Decision, rather than unawareness or a slip in methodology. Since this issue is almost entirely out of scope of this manuscript, I can promise to write a separate paper dedicated entirely to discussing the issues of multiple testing and FDR in systems biology in general and in time series analysis in particular. This is not going to be an easily acceptable paper, but somebody has to write it and better soon.