Advertisement
Research Article

Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations

  • Taras K. Oleksyk mail,

    *E-mail: oleksyk@ncifcrf.gov

    Affiliations: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America, Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Kai Zhao,

    Affiliation: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Francisco M. De La Vega,

    Affiliation: Applied Biosystems, Foster City, California, United States of America

    X
  • Dennis A. Gilbert,

    Affiliation: Applied Biosystems, Foster City, California, United States of America

    X
  • Stephen J. O'Brien,

    Affiliation: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Michael W. Smith

    Affiliations: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America, Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Published: March 05, 2008
  • DOI: 10.1371/journal.pone.0001712

Reader Comments (16)

Post a new comment on this article

Referee comments: Referee 1

Posted by PLoS_ONE_Group on 07 Mar 2008 at 17:47 GMT

Referee 1's review:

Review of the first revised manuscript:
Oleksyk et al. have addressed most of my previous comments. In particular they have performed simulations to assess the operating characteristics of their method. This is important and helps to assuage my previous concerns.

However, I do think it is important that the authors clarify some of the details of the simulations. In particular, they used SelSim to simulate data under the null (s = 0) and alternative hypotheses (s = 0.03, 0.3). I have two comments here. First, I do not believe that SelSim allows demographic models with population structure. If I am wrong, can the authors provide the version number of SelSim they are using as well as more details about the model of structure employed (i.e., when did the population diverge, etc.). If I am correct, how was SelSim used to evaluate their statistic based on population structure? The second issue is that a selective value of 0.3 is extroidinarily high - even 0.03, which the authors consider to "low selective coefficients" is pretty high for human populations (see Bersaglieri et al. 2004, AJHG). Thus, it is unclear whether the authors conclusions would be valid under weaker, and much more realistic, paramater values.

Review of the second revised manuscript:
The authors have addressed my questions and I have no further requests for clarifications. I would like to note however that I am skeptical about the validity of the simulation results using SelSim. In particular, the scheme outlined by Pollinger et al that is used in this paper only approximates the genealogy of interest. Indeed as noted by Pollinger et al "We believe this problem is not substantial because most dog breeds were formed by the selective breeding of small numbers of individuals who shared a particular phenotype of interest to the exclusion of other potential founders." If the assumptions of this simulation scheme are approximately true for dogs, will they reasonably be met for human populations?

**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication the manuscript has been revised in light of these comments and to address other editorial requirements.


RE: Referee comments: Referee 1

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:31 GMT

Comment #1:
“Oleksyk et al. have addressed most of my previous comments. In particular they have performed simulations to assess the operating characteristics of their method. This is important and helps to assuage my previous concerns.”

Answer:
Thank you. The comments we have addressed in both revisions have greatly improved the paper. We are thankful to the reviewer for pointing us towards using the SelSim for simulations. We were not aware of this possibility when we first started working on this project (since this program didn’t exist a few years ago). Since then, SelSim has proven to be an excellent tool for modeling selection processes in the genomic context for this paper and others in the literature.


RE: Referee comments: Referee 1

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:35 GMT

Comment #2:
“First, I do not believe that SelSim allows demographic models with population structure. If I am wrong, can the authors provide the version number of SelSim they are using as well as more details about the model of structure employed (i.e., when did the population diverge, etc.). If I am correct, how was SelSim used to evaluate their statistic based on population structure?”

Answer:
The reviewer is absolutely correct stating that the current version of SelSim is not made to allow for demographic models with population structure. The program was designed to simulate trajectories of ancestral and derived alleles and haplotypes that include these alleles in the coalescent simulation in one population. As we simulated two separate populations, we used an approach previously validated in the literature (Pollinger et al. 2005) to model a selective sweep in dog breeds. In our simulation their solution was closely emulated (as detailed below). We agree that we need to do a better job specifying exactly how we used SelSim to address population structure and have now detailed our simulation scheme in the manuscript.

Action:
We now explain our approach and the assumptions made in the SelSim simulation of genomic selection. We added the following paragraph into the Materials and Methods section:
“The simulation scheme emulates one previously utilized to model population substructure of dog breeds where Pollinger et al. (2005) showed significant heterozygosity and FST effects after a selective sweep. The SelSim model assumes the initial mutation to be rare (Spencer and Coop 2004), and may overestimate the selection signal if selection started on a mutation that reached significant frequency (Pollinger et al. 2005).Our study used SelSim for “proof of principle” purposes to demonstrate the patterns of our measurements: a decrease in heterozygosity, and particularly the increase in the S2FST measure we described., The exact strength and the extent of the selection signatures under different conditions is beyond the scope of the SelSim program and this report. In the simulation presently employed, 48 chromosomes carried the selected mutation (derived) and 48 chromosomes did not (ancestral), approximating the numbers of individuals we examined in each population. If one of the populations was recently separated from the other and selection acted on a mutation that arose in an ancestral population, the partial selective sweep is an approximation of the true process with the assumption that most of the neutral variation appeared before the introduction of the selected mutation. Therefore, the set of haplotypes that carry the selected mutation contains a subsample of neutral variation in the ancestral population, and can be contrasted with the set carrying the ancestral allele.”


RE: Referee comments: Referee 1

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:36 GMT

Comment #3:
“The second issue is that a selective value of 0.3 is extraordinarily high - even 0.03, which the authors consider to "low selective coefficients" is pretty high for human populations (see Bersaglieri et al. 2004). Thus, it is unclear whether the authors conclusions would be valid under weaker, and much more realistic, parameter values.”

Answer:
The reviewer is embarrassingly right that we mis-reported the s values tested and we apologize for this error. We simply incorrectly described values of s evaluated in our simulations. We actually ran simulations ran with s=0.003 and 0.03 rather than the 0.3 and 0.03 as we previously wrote. We don't know how the mistake crept into the figure and text, but it is now fixed. The reviewer’s comment also led us to justification of the levels of selection used in our simulation (see below), and we believe that the values we used are more than reasonable.

Action:
We corrected the relevant information in Figure 2 (selection coefficient ranges from s=0.003 to s=0.03) and added the following paragraph to the Results and Discussion section.
“The choice of selection coefficients in the SelSim simulations was based on previously reported estimates. For example, selection coefficient for the lactase-persistence allele was predicted to be between 0.014 and 0.15 in CEPH, and between 0.09 and 0.19 in the Scandinavian population (Bersaglieri et al. 2004). Furthermore, selection coefficient has been set between 0.02 and 0.05 for G6PD deficiency that gives advantage to survival in the malarial regions (Tishkoff et al. 2001).”