Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeBackground on this paper
Posted by robwilliams on 15 Feb 2014 at 14:33 GMT
We thought this was going to be a two page note to Frontiers in Neurogenomics about 4 years about, but the analysis got more and more interesting and puzzling. We did not rediscover the lovely work of Robert Hoffmann (now head of WikiGene) until the paper had been submitted in succession to six higher profile journals (Nature Comm, Genome Biology, and three other PLoS series). Hoffmann and colleagues showed that social factors account for much of the annotation imbalance for genes. We definitely agree and in this paper provide a good way to compute literature imbalance. I think everyone assumes that the imbalance will be eliminated by more work, but my guess is that the imbalance will only get worse. The paper got better with each iteration due to more analysis by Ashutosh Pandey and comments by experts like Paul Pavlidis and Megan Mulligan.
RE: Background on this paper
agshearer replied to robwilliams on 19 Feb 2014 at 07:54 GMT
We have definitely seen a "rich get richer" phenomenon at work in identification of sequences for orphan enzymes (sort of the mirror to the challenge of identifying function for uncharacterized genes). Researchers tend to (very naturally) focus their efforts on known functional assignments.
Systematic attempts to add more data to unknowns (as you did in this paper) can help to alleviate that problem by giving researchers more "handles" they can work with when an otherwise uncharacterized gene appears in the "significant" set from a high-throughput analysis.