Reader Comments

Post a new comment on this article

Referee Comments: Referee 1

Posted by PLOS_ONE_Group on 20 May 2008 at 10:44 GMT

Referee 1's review:

**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication, the manuscript has been revised in light of these comments and to address other editorial requirements.
*********

Review of Jaing et al "A functional gene array for detection of bacterial virulence elements.”

Jaing et al describe the design of a DNA microarray for detection of bacterial virulence elements. By focusing on orthologous genes in their design, the microarray is able to detect related bacterial species, with the idea that conserved virulence genes or antibiotic resistance genes are important functional markers that should be targeted. The authors present a number of algorithm optimizations, such as the use of an "adjusted deltaG" value, that empirically yields the best predictor of observed hybridization intensity for perfectly matched cognate sequences. In addition, they perform a series of tests to examine the effects of mismatches as a means of mimicking the natural divergence that may be present in orthologous sequences. This work presents a broad platform for targeting of conserved sequences and validates the application towards conserved bacterial gene families.

There are a number of ways in which the manuscript could be improved:

Major Points:

1) Insufficient detail is included in the description of the probe design. The crux of the paper is that it claims to be able to detect conserved elements in bacteria related to the original genome. However, the criteria for selecting the conserved elements are not described, other than the statement that "high conservation in orthologous genes of distantly related organisms" were part of the iterative ranking algorithm. What was the basis for selection of a particular probe? Was it based on the "adjusted deltaG" for the probe versus the genomes to be detected? I did not see any description of such a calculation. Or was the criteria based on the presence of fewer than 3 mismatches between the probe and the orthologous target sequences? Did the number of orthologous genomes to which there was "high conservation" factor into the design? As I understand it, the probe selection process begins with calculations of deltaG in relation to the original(self) genome, but no information is provided as to how the initial candidates were then refined.

Furthermore, the presentation of the "equivalence group" is confusing and needs clarification. I do not understand what defines an equivalence group. The authors state "Equivalence groups were defined so that all probes in a group were complementary to a known set of target sequences, with each target sequence represented by at least one equivalence group". Does this mean that the set of probes which are designed to detect all orthologs of a given family is an equivalence group? If this is correct, then given N gene families that one wants to detect, are there simply N equivalence groups? Under what situations would a target sequence be represented by more than one equivalence group? What is the total number of equivalence groups that were selected for the 299 gene families? Is this essentially a means of normalizing the probe coverage on the microarray?

2) In the abstract, the authors claim that they achieve "an average target detection rate of 95% in the presence of three mismatches. It is not clear to me which data in the paper specifically support this claim. Is this based on the mismatch permutation data? Or the experimental data examining the orthologous genomes? This claim needs to be substantiated in the text or else eliminated from the abstract.

Minor Points
1) In the introduction, paragraphs 4 and 5 sounds a little too much like an advertisement for Nimblegen. While the flexibility of the Nimblegen platform is indeed a virtue, other microarray formats, such as ink-jet synthesis (Agilent) or electrochemical synthesis (Combimatrix) etc, also offer the ability to custom synthesize probe sets, and similar experiments are certainly feasible in other platforms.

2) In a series of experiments examining the ability of "conserved" probes to detect orthologous sequences in related organisms, the authors find that their probes perform reasonably well, with a low false positive rate. In fact, the authors are able to attribute most of the cases of strong signal to domains shared between other families. Given that these sequences can be accounted for by BLAST analysis, it should be possible to incorporate a filtering step that checks the "conserved probes" against sequences outside of the family, and thereby further reduce the false positivity rate.

3) There does not appear to be any indication of the number of probes that were designed for any of the experiments. Primary microarray data and probe sequences should be deposited in a public repository, such as NCBI Geo in order for the paper to be published.

4) In the analysis the single mismatch effects, it appears that probes that ranged in length from 30 to 66 were used and the overall average was reported. I would expect that there would be a relationship between the MM/PM signal ratio and the length (or deltaG) of the probe. Was this the case? How does this affect the design parameters?

5) Some of the graphs are relatively uninformative (e.g. Figure 5, 6). Summary of the results in a table format would be sufficient.