Reader Comments

Post a new comment on this article

A comment on the statistical analysis in Kasumovic and Kuznekoff (2015)

Posted by melted_snowball on 04 Aug 2015 at 13:33 GMT

Daniel G. Brown and Cecilia A. Cotton
August 4, 2015

We wish to highlight some significant concerns we have with the statistical analyses presented in this paper.

The authors used Poisson generalized linear models (GLMs) to model the associations between a variety of focal player characteristics, the gender of the treatment player and the number of positive, negative, and neutral statements made by the focal player toward the treatment player. However, a residual anal- ysis shows evidence of overdispersion in all six fitted Poisson models. Overdispersion occurs when there is more variation in the data than can be accounted for through the model. Poisson models are particularly susceptible to problems with overdispersion since the mean and variance terms are assumed to be equal. The danger of not accounting for all the dispersion in the data is that standard errors will tend to be underestimated leading to inappropriately small p-values. To correct for the overdispersion we refit the models using Negative Binomial GLMs and found them to provide a superior fit. We did not remove outliers from the analysis of the negative comments, as the Negative Binomial model should better handle the variability caused by these observations.

The authors’ main conclusions are visually summarized in Figures 1 and 2 of the paper. We have recreated these figures based on Negative Binomial models and included confidence bands. The shape of the curves is generally unchanged from the Poisson analysis but the curves predominantly lie within each other’s confidence bands. In our refitted versions of the six models, neither the experimental manipulation (gender of the treatment player), nor interactions with that term, were ever statistically significant. We also offer updated versions of the main tables of the paper, including the exponentiated coefficients of the estimated regression models which have relative rate interpretations.

As a result of our analysis we find that the authors’ conclusions are not supported by the data. The data do support the conclusion that high-skill players make more positive and fewer negative comments, perhaps consistent with good team-building behaviour.

We would be pleased to provide a more thorough commentary on the data, analysis and conclusions. While we do not wish to minimize the harms of sexist behaviour, both online and in broader society, we still believe that society is better served by good scientific analysis.
-----
Because of the inability to directly embed figures and tables into this comment field, I am linking them here:

Update to Figure 1: https://cs.uwaterloo.ca/~...

Update to Figure 2: https://cs.uwaterloo.ca/~...

Update to Tables 1 and 2, with detailed captions: https://cs.uwaterloo.ca/~...

No competing interests declared.

RE: A comment on the statistical analysis in Kasumovic and Kuznekoff (2015)

mkasumovic replied to melted_snowball on 20 Aug 2015 at 04:10 GMT

I thank Drs Brown and Cotton for taking the time to explore our data and we take this opportunity to respond to the comments made regarding our statistical models.

I would like to start by saying that I am concerned by the lack of transparency on the side of Drs Brown and Cotton. The comment states that the residuals are overdispersed and that their models provide a “superior fit” without providing any evidence of such statements despite the particular ease of doing so. This is particularly disconcerting given their statement that they “still believe that society is better served by good scientific analysis”, which is not demonstrated here since they neither provide the code nor the parameters for their analyses. This is in contrast to us that have been completely open by consistently providing code and data.

Thus, to determine the validity of their statements, we first performed the analyses they suggest and explore these analyses to determine whether their conclusion that a negative binomial model is necessary is correct. We provide the code (https://github.com/latrod...) and plots (http://www.michaelkasumov...) associated with each of the analyses below to demonstrate our transparency in our exploration of the data.



To first examine their statement, we explored the residuals in our models and the models associated with the data and provide the plots demonstrating the distribution of the residuals.

As can be seen in our plots, the distribution of the residuals look very similar and there is no evidence for dispersion in our model (left) compared to the negative binomial model (right). Additionally, when examining the theta value for the negative binomial analysis (modelAGGPos.nb in our R code), the theta is 1.108 which provides no support to the conclusion that a negative binomial model is necessary.

Because we are not clear of the methodology used by Drs Brown and Cotton, we explored the data further by estimating theta in two different ways and exploring the data using Dunn-Smyth residuals to provide a more accurate examination of the residuals and their dispersion.

As can be seen below, the distributions of the residuals of our model (top) do not visually differ from the two negative binomial models determined by two different theta estimation techniques (middle: phi, bottom: maximum liklihood). Again, there is absolutely no evidence for overdispersion of our data.

Based on the analyses above, I conclude that there is no a priori reason to use a more complex model that fits extra parameters simply because “poisson models are particularly susceptible to problems with overdispersion”. Although this may be true in some data, there is no such evidence of overdispersion in our data, and as a result, no need to use a negative binomial model simply to make analyses more complex.

The analyses that we provide thus support the statistics used in our study, and therefore, our conclusions.

Without proper support for the use of a particular statistical technique, it seems that post hoc suggestions of analytical changes are suggested in an attempt to discredit our study. Although science benefits from discussion, it also benefits from complete openness, something that we have demonstrated from the beginning.

No competing interests declared.

RE: RE: A comment on the statistical analysis in Kasumovic and Kuznekoff (2015)

melted_snowball replied to mkasumovic on 24 Aug 2015 at 21:03 GMT

Response to comment by Kasumovic:

In their response to our comment, the authors conclude that while
overdispersion is possible with Poisson models "there is no such
evidence of overdispersion in our data." We respectfully disagree
with this conclusion and present the following three pieces of
evidence for overdispersion in their data. We confine our discussion
to the models for positive statements, though the effect is also
present for their other analyses.


1) Residual Analysis.

The authors provide residual plots for the Poisson and Negative
Binomial models and conclude that the "distribution of the residuals
look very similar." We have recreated these plots using the same
y-axis scale. The plots are not similar when viewed on the same scale.

(Figure here:
http://www.cs.uwaterloo.c...)

If the Poisson model provided an adequate fit to the data we would
expect that the deviance residuals would approximate a standard normal
distribution with mean zero and variance one. In particular,
approximately 95% of the residuals would be in the range
(-1.96,1.96). In fact, just under 70% of the residuals are within this
range, and there are extreme outliers. The poor fit could be due to a
number of causes including missing covariates, an improper link
function, or overdispersion. Certainly, it draws into question any
conclusions based on that model.

By contrast, almost all deviance residuals for the negative binomial
model do fall in this range, indicating a far better fit.

2) Poisson Model Residual Deviance

One common, informal way to detect under- or overdispersion in
generalized linear models is to compare the estimated deviance
statistic (i.e., the sum of squared deviance residuals), D, to its
expected value of n - p. In this case, D = 367.67, which is much greater than
n-p = 126-9 = 117, which suggests overdispersion in the data. The
statistic D should be distributed approximately as a chi-squared random
variable with n - p degrees of freedom; such variables have variance
2(k - p).

Alternatively, one may consider the sum of squared Pearson
residuals. It is 416.43, also far more than expected under adequate
fit.

For more formal tests of overdispersion, see section 7.4 of [1]. For
example, using a score test originally suggested by [2] and appearing
in Section 7.4.1 of [1], we obtain a p-value of 0.005 and therefore
reject the null hypothesis of no overdispersion.

3) Negative Binomial Model Theta Estimate. In their response the
authors state that "the theta is 1.108 which provides no support to
the conclusion that a negative binomial model is necessary." This is
the maximum likelihood value of theta, but we do not agree with their
assertion.

The variance function of the negative binomial distribution for the
parameterization used in the glm.nb function from the mass library in
R (see section 7.4 of [3]) is:

V (mu) = mu + mu^2 / theta

Note that the variance function of the Poisson distribution is
V (mu) = mu. If a negative binomial model was fit to Poisson data
without overdispersion, we would expect a very large estimate of
theta. In this case the estimate of theta is 1.108 (standard error
0.227) actually suggests that the data is overdispersed and that the
variance is quadratic in the mean.


We hope that the above three points explain why we believe the data
are overdispersed. We have made our code available
(http://www.cs.uwaterloo.c...). Any
conclusions drawn from the analysis of this data should be based on
models that account for overdispersion. In particular, the negative
binomial models that we generated for our previous comment do better
represent the data, and in these models, none of the major results of
the original paper are statistically significant.

References:

[1] Hilbe, J. M. (2011). Negative binomial regression. Cambridge
University Press.

[2] Dean, C. and Lawless, J. F. (1989). Tests for detecting
overdispersion in Poisson regression models. Journal of the
American Statistical Association, 84(406), 467-472.

[3] Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics
with S. Fourth edition. Springer.

(Full version of comment, as .pdf:
http://www.cs.uwaterloo.c...)

-Daniel G. Brown (*) and Cecilia A. Cotton (**)

(*) Cheriton School of Computer Science, University of Waterloo
(**) Department of Statistics and Actuarial Science, University of
Waterloo

No competing interests declared.

RE: RE: RE: A comment on the statistical analysis in Kasumovic and Kuznekoff (2015)

mkasumovic replied to melted_snowball on 31 Aug 2015 at 02:28 GMT

I see that my above response was unsatisfactory, and I am now of the mind that this is becoming a little bit of a statistical “witch hunt” as it is progressing past acceptable statistical analyses and into the realm of “my statistics are better”.

Before responding to the above comment, I initially want to state that there is no one statistical technique that is best. These two posts here discuss this topic well and I suggest that any reading these comments have a look. These posts discuss the strengths and weaknesses of statistics and also the way that scientists go about thinking about and using statistics.

1. https://cesess.wordpress....
2. http://fivethirtyeight.co...

To respond to the comment, rather than argue about which technique is “more correct” (and yes, I noticed that you had to put you credentials on the bottom in order to highlight your apparent superior qualifications in this matter), I am going to use a Bayesian approach so that we don't get bogged down in searching for p-values and assumptions over distributions and over-dispersion.

The figures can once again be found here (http://www.michaelkasumov...) and the code can be found amended to the original code.

Rather than searching for p-values, this approach provides us with an estimate of how likely our individual factors are to have an effect. From looking at the plots of the analyses, we see that the posterior distribution is much further from 0 in the factors that were originally significant (denoted by the red asterisks in the figure).

This secondary approach thus provides similar results to our initial approach. Interestingly, in this Bayesian approach, there is evidence that the outcome of the game also has interesting effects on behaviour, something our initial model did not demonstrate. and something to explore further.

I hope that this now demonstrates two things:

1) the robust nature of our data, and
2) the fact that there is no 'one best' statistical methods to analyze data.

How data are analyzed are dependent on assumptions, individual perceptions, and experiences dealing with different types of data. Arguing over the best approach and the extent to which our over-dispersion is a problem are not really progressing science. If you don’t “believe” our results, no problem, go ahead and replicate our experiment.

Effort into replication would be much better served as we need more data examining these ideas and theories. In fact, I encourage others to explore these ideas further as we are doing the same. Hopefully, after many studies, we can come to a general consensus.

I will also say that we are not taking our results as gospel truth (no science should ever be). What we are saying is that we have interesting effects of several factors on male behaviour towards women. What our results suggests is that these factors need to be explored further. Arguing over statistical approaches is not going to change the fact that further studies need to be done to explore these ideas.

At this point, I will consider this matter closed and will not discuss the validity of our statistical models as I see no benefit to doing so as it seems you will simply attempt to discredit us using another statistical approach you find to be superior.

I hope others find these comments and the discussion useful.

No competing interests declared.