Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeEquations of MSP and MSG
Posted by donthu on 17 Mar 2008 at 19:22 GMT
Congratulations for the wonderful work done in applying different measures for identifying selection regions. I would like to know explanations of the terms used in the equations for calculating MSP and MSG that are used in Fst equation.
In the equation for MSP, what does the term pbar A means. In the equation for MSG there is a term n1. I am not sure what does it refers to. Please clarify me.
Thank you,
Kiran
RE: Equations of MSP and MSG
oleksyk replied to donthu on 18 Mar 2008 at 17:43 GMT
Thank you for the kind words. The MSP and and MSG terms came from the article by Akey, (Akey et al., Interrogating a high-density SNP map for signatures of natural selection, Genome Res. 12 (2002), pp. 1805–1814) which in turn comes from Weir and Cockerham paper (Weir and Cockerham CC., Estimating F-statistics for the analysis of population structure. Evolutiion, 38: 1358-1370).
Let me explain what each term means in context of the two samples I worked with, so it would be easier to follow if you want to apply it to your own example:
MSP is the observed mean square error for loci between populations:
MSP= (count of European alleles * (frequency of European allele - (frequency of European allele + frequency of African allele)/2) squared + (count of African alleles * (frequency of African allele - (frequency of African allele + frequency of European allele)/2) squared
MSG is the observed mean square error for loci within populations:
MSG = 1/(count of European alleles + count of African alleles - 2) * ((count of European alleles * frequency of European alleles * ( 1-frequency of European alleles)) + (count of African alleles * frequency of African alleles * (1 - frequency of African alleles)))
nc is a average sample size across samples that also incorporates the variance in sample sizes over the populations:
nc=(count of European alleles + count of African alleles) - ((( count of European alleles) squared + (count of African alleles) squared)/ (count of European alleles + count of African alleles))
Fst then is calculated as :
Fst=(MSP-MSG)/(MSP+(nc-1)*MSG)
Tis is a point estimate of Fst at each snp. It should be noted, however, that this estimate may result in negative values, which are usually zeroed. In the above example, the allele frequency was assumed to be the allele frequency of the major allele in Europeans.
P.S. The above equation can be coded in sas DATA step language as follows:
nc=(cs_count+aa_count)-(((cs_count)**2+(aa_count)**2)/(cs_count+aa_count) );
msp=cs_count*(csfreq-(csfreq+aafreq)/2)**2 + aa_count*(aafreq-(csfreq+aafreq)/2)**2;
msg=1/(cs_count+aa_count-2)*((cs_count*csfreq*(1-csfreq))+(aa_count*aafreq*(1-aafreq)));
if msp=0 then fst=0; else if msg=0 then fst=0; else fst=(msp-msg)/(msp+(nc-1)*msg);