Reader Comments

Post a new comment on this article

A couple questions

Posted by dahinds on 24 Jan 2012 at 08:22 GMT

- the methods and Table 3 indicate that 3613 iControlDB controls were used, but the numbers used in each stage in Table 2 add to 3877. Can you confirm that no controls were used more than once?

- what criteria were used for excluding related samples? Were these applied to controls as well as cases, and were the NECS replication samples tested for relatedness with the NECS discovery samples?

- in the training/testing analysis of the risk model in the discovery data, did you repeat the entire GWAS and SNP selection for each of the 1000 training replicates? Or did you reuse the set of (400?) SNPs selected from the full GWAS, in all of the replicates?

Thanks for any clarifications!

No competing interests declared.

RE: A couple questions

sebastiani replied to dahinds on 24 Jan 2012 at 16:16 GMT

Hi David

1) No control was used more than once in the discovery and the two replication sets. Note that, in addition to the Illumina controls, we also used controls enrolled in the NECS, so the overall number of controls is larger than controls from the Illumina iControlDB. These numbers are shown in Table 2.

2) We checked all subjects for relatedness as a step prior to the stratification analysis with Eigenstrat, and removed related people. In the selection of unrelated subjects in the NECS, we choose the oldest between two siblings to be used as case, and the youngest among two siblings to be used as control.

3) In the training/testing step we decided not to repeat the whole analysis to speed up computations. The purpose of this step was really to decide which of the top associated SNPs were sufficient for prediction. This strategy seems to work fine in this particular application, but we are currently exploring improvement to the SNP selection strategy. We discuss this in the limitation section of the manuscript.

Thank you again for these questions !

Paola Sebastiani

No competing interests declared.

RE: RE: A couple questions

dahinds replied to sebastiani on 24 Jan 2012 at 16:59 GMT

Thanks:

I do see that some controls came from NECS, but was looking specifically at just the Illumina numbers. The text says that 3613 iControlDB controls were used, and the sum of row 3 of Table 3 is 3613. The text also says that 673+341+2863 iControlDB controls were used in the three stages, which agrees with Table 2, but that adds to 3877.

No competing interests declared.

RE: RE: RE: A couple questions

sebastiani replied to dahinds on 24 Jan 2012 at 23:26 GMT

Hi David

the difference is between the number of controls that were used for genetic matching (Table 3) and the overall number of controls used in the various analyses (Table 2). Not all controls were used for genetic matching.

Paola

No competing interests declared.

RE: RE: A couple questions

dahinds replied to sebastiani on 25 Jan 2012 at 08:56 GMT

Hi Paola,

Given that you did not repeat the whole analysis in the training/test iterations, have you considered that this might result in overfitting, as discussed in your ref. 41?

- Dave

No competing interests declared.

RE: RE: RE: A couple questions

sebastiani replied to dahinds on 25 Jan 2012 at 14:40 GMT

Hi Dave

I agree that some overfitting is inevitable. To try to limit this we choose for example to stop including SNPs when the sensitivity/specificity did not increase substantially. But the risk of overfitting remains and that is why the overall accuracy of the model in the discovery set should be not given too much importance. The result in the two independent sets is what matters.

Paola

No competing interests declared.