Advertisement
Research Article

Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations

  • Taras K. Oleksyk mail,

    *E-mail: oleksyk@ncifcrf.gov

    Affiliations: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America, Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Kai Zhao,

    Affiliation: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Francisco M. De La Vega,

    Affiliation: Applied Biosystems, Foster City, California, United States of America

    X
  • Dennis A. Gilbert,

    Affiliation: Applied Biosystems, Foster City, California, United States of America

    X
  • Stephen J. O'Brien,

    Affiliation: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Michael W. Smith

    Affiliations: Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, Maryland, United States of America, Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, United States of America

    X
  • Published: March 05, 2008
  • DOI: 10.1371/journal.pone.0001712

Reader Comments (16)

Post a new comment on this article

Referee comments: Referee 2

Posted by PLoS_ONE_Group on 07 Mar 2008 at 17:50 GMT

Review of the first revised manuscript:
I had previously suggested that identifying ancestral and derived SNP alleles could add a lot of power to this test for local directional selection. If the authors are correct, and divergent selection typically does increase the variance among site-specific Fst values for a contiguous set of SNPs, then it seems like it would be critically important to know whether that effect is attributable to an increase in the frequency of derived variants in just one population or the other.

The authors seem to agree that this is a reasonable suggestion. Their reasons for not polarizing their SNP data are not very satisfying. They state: "...one of our goals was to develop a method that would work even when the information about the ancestral allele was not known. This makes our approach applicable to the majority of species where knowledge of the ancestral allele is not available." That would be fine if the primary objective of the study was to provide a general method that is applicable to light-coverage population genomic data sets (for example, based on whole genome shotgun sequencing of multiple individuals). And of course, the authors do mention this benefit of the approach that they have developed. However, the title of this manuscript is: "Footprints of historic natural selection in the human genome". If the stated purpose of the study is to make inferences about the history of selection on variation in the human genome, then the study should be evaluated on the basis of whether it does in fact reveal something interesting and noteworthy about the history of selection on variation in the human genome.

Also, it seems like the most direct way to evaluate the efficacy of the authors' approach would be to apply it to the exact same HapMap data that were used in the other studies that they cite. I don't agree with the authors' assertion that doing this would "divert the reader's attention from the generality of the approach". I think it would be the most direct way to evaluate the utility and generality of the approach.

Review of the second revised manuscript:
This revised version of the manuscript is an improvement.

**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication the manuscript has been revised in light of these comments and to address other editorial requirements.


RE: Referee comments: Referee 2

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:21 GMT

Comment#1:
“I had previously suggested that identifying ancestral and derived SNP alleles could add a lot of power to this test for local directional selection. If the authors are correct, and divergent selection typically does increase the variance among site-specific Fst values for a contiguous set of SNPs, then it seems like it would be critically important to know whether that effect is attributable to an increase in the frequency of derived variants in just one population or the other.”

Answer:
It is a valid argument that identifying ancestral alleles can enhance the power of identifying selection. This is precisely why many of the previous selection scans have been confined to statistics that used ancestral allele information. Undoubtedly, it works well in the human populations. However, we ininially did not have these data and wanted to develop a method that can also be applied to the diploid population data from other species. Currently, many of those species have genome sequences, and some of them will soon have dense genotypic panels. Yet, not many of these species have the related species sequenced, making inference the ancestral states problematic. The minimal data requirements will make our approach attractive for comparing populations of any diploid species, and can lead to applications in environmental and evolutionary genomics, as well as genomic epidemiology.


RE: Referee comments: Referee 2

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:24 GMT

Comment #2:
“The authors seem to agree that this is a reasonable suggestion. Their reasons for not polarizing their SNP data are not very satisfying. They state: "...one of our goals was to develop a method that would work even when the information about the ancestral allele was not known. This makes our approach applicable to the majority of species where knowledge of the ancestral allele is not available." That would be fine if the primary objective of the study was to provide a general method that is applicable to light-coverage population genomic data sets (for example, based on whole genome shotgun sequencing of multiple individuals). And of course, the authors do mention this benefit of the approach that they have developed. However, the title of this manuscript is: "Footprints of historic natural selection in the human genome". If the stated purpose of the study is to make inferences about the history of selection on variation in the human genome, then the study should be evaluated on
the basis of whether it does in fact reveal something interesting and noteworthy about the history of selection on variation in the human genome.”

Answer:
The reviewer is right to suggest that the objectives of our paper indeed were to provide a general method applicable to the light-coverage population genomic datasets. However, the title we used for the previous submission was setting a global goal of detecting and evaluating regions of historic selection, and may have been too general and ambitious. While we did provide locations of selected regions and listed genes present within these locations, we did not look at any of these regions in depth. We agree that the title of the manuscript needs to be changed.

Action:
As suggested by the reviewer, we changed the title of our manuscript to:
“Identifying selected regions from heterozygosity and divergence using a light-coverage genomic dataset from two human populations”


RE: Referee comments: Referee 2

oleksyk replied to PLoS_ONE_Group on 11 Mar 2008 at 16:27 GMT

Comment #3:
“Also, it seems like the most direct way to evaluate the efficacy of the authors' approach would be to apply it to the exact same HapMap data that were used in the other studies that they cite. I don't agree with the authors' assertion that doing this would "divert the reader's attention from the generality of the approach". I think it would be the most direct way to evaluate the utility and generality of the approach.”

Answer:
We agree that the evaluation of HapMap data by our current method is an interesting and useful step. Furthermore, we believe that such project can lead to the evaluating of the history of selection in the human genome, as suggested by the reviewer in the previous comment. We hope to address these questions in our future research. However, we still believe that inclusion of the 3.1 million SNP HapMap data in the submitted paper would emphasize its “human” dimension, stress the reliance on dense genotyping data and abandon analysis of a valuable and independent dataset that we report on. A HapMap II analysis would likely discourage many researchers who are interested in using our approach to screen for selected regions in species besides humans where HapMap like data will likely not be available in the next few years. So while we agree and are pursuing a HapMap II analysis, we hope that you understand our reluctance to include the super-dense HapMap database in the current analysis. In this paper, we report a solid finding that is put in perspective with other selective scans and allows us instead to concentrate on proof of principle with the simulated dataset, candidate selection regions and the genome wide scan in a small independent dataset as we have written this manuscript.