Reader Comments

Post a new comment on this article

Major problem in the estimates of the rate of gene family extinction

Posted by LaurentDuret on 27 Jul 2007 at 13:06 GMT

This paper addresses an important question: what is the rate and pattern of evolution of the gene repertoire in mammals ? Indeed, whereas the evolutionary forces shaping the rate of sequence evolution have been well studied, little is known about the frequency of gene losses or gene gains.

The problem of that paper is that the identification of gene losses and gene creations (or duplication) relies exclusively on the analysis of the content of Ensembl gene families. An Ensembl gene family that includes only human genes is considered as a gene family "creation" in the human branch. Conversely, a gene family that is present in chimpanzee and dog but that does not include any human gene is considered as being "extinct" in human.

The problem is that the absence of a given gene in an Ensembl family might correspond to different artefactual situations:

a- the gene exists but is located in a region that has not been sequenced (or correctly assembled) yet
b- the gene exists but has not been identified (annotated) yet
c- the gene exists but was not classified in the gene family because the clustering criteria (sequence similarity, length of the alignment) that were used to define Ensembl gene families were too stringent


Although the authors discuss these possible artefacts in their paper, I am not convinced when they claim that these artefacts should have little impact on their conclusion.

As a control for the reliability of their analyses I looked at the 49 gene families that were considered as having been lost in the human lineage ("extinctions" in their Table 2). I retrieved in the supplementary Table S2 all the gene families that contain at least one chimp sequence and one non-primate sequence but no human sequence. These 49 gene families are all represented by a single gene in chimp:

FID chimp human mouse rat dog
ENSF00000002436 1 0 24 9 0
ENSF00000002900 1 0 2 2 2
ENSF00000003534 1 0 1 1 1
ENSF00000003743 1 0 2 2 1
ENSF00000004000 1 0 1 1 1
ENSF00000004811 1 0 1 1 0
ENSF00000004836 1 0 1 1 1
ENSF00000004840 1 0 1 2 1
ENSF00000005029 1 0 1 1 1
ENSF00000005367 1 0 1 1 1
ENSF00000005368 1 0 1 1 1
ENSF00000005776 1 0 1 1 1
ENSF00000006245 1 0 1 1 1
ENSF00000006438 1 0 1 1 1
ENSF00000006709 1 0 1 1 0
ENSF00000006835 1 0 1 1 1
ENSF00000007144 1 0 1 1 1
ENSF00000007553 1 0 1 1 1
ENSF00000007676 1 0 2 2 1
ENSF00000007697 1 0 1 1 1
ENSF00000007845 1 0 1 2 1
ENSF00000007989 1 0 2 1 1
ENSF00000008030 1 0 1 1 1
ENSF00000008484 1 0 0 1 1
ENSF00000008589 1 0 1 1 1
ENSF00000008702 1 0 1 1 1
ENSF00000009151 1 0 1 1 1
ENSF00000009267 1 0 1 1 1
ENSF00000009414 1 0 1 1 1
ENSF00000009416 1 0 1 1 1
ENSF00000009499 1 0 1 0 0
ENSF00000009609 1 0 0 1 1
ENSF00000009610 1 0 1 1 0
ENSF00000009800 1 0 1 1 2
ENSF00000009884 1 0 1 1 1
ENSF00000009934 1 0 1 1 1
ENSF00000010085 1 0 1 1 1
ENSF00000010169 1 0 1 1 0
ENSF00000010256 1 0 1 1 1
ENSF00000010448 1 0 1 1 1
ENSF00000010502 1 0 1 1 0
ENSF00000010519 1 0 1 1 1
ENSF00000010549 1 0 1 1 1
ENSF00000010665 1 0 1 1 1
ENSF00000010678 1 0 1 1 2
ENSF00000011177 1 0 0 0 1
ENSF00000011186 1 0 0 1 1
ENSF00000011190 1 0 0 1 1
ENSF00000011513 1 0 1 1 1


Then I extracted the corresponding chimp protein from Ensembl release 41 using BioMart:

http://oct2006.archive.en...

The 49 chimp genes correspond to 77 proteins (some genes encode alternative splice variants).

Then I downloaded all human proteins annotated in Ensembl release 41

ftp://ftp.ensembl.org/pub...


Finally, I BLASTed the 77 chimp proteins against the human proteome (Ensembl release 41): each of these chimp proteins has a very strong match in human : average identity (at the protein level) = 99%; minimum = 86%. Thus, none of these 49 gene families has been lost in the human lineage.

In conclusion, the rate of gene family extinction in the human lineage (Table 2) appears to be overestimated ... by a factor of 100%. It is likely that similar problems affect also the numbers given for other species.

Note that I am not saying that gene losses and gene gains are not important for species evolution. Demuth and colleague may well be correct when they say that the rates of evolution of the gene repertoire is high (or higher than had been appreciated). However, they do not have performed all the controls that would have been necessary to assess the reliability of their results.

Laurent Duret