A paper in PNAS got some attention on Twitter recently. It’s called Childhood trauma history is linked to abnormal brain connectivity in major depression.
Now, I think that this talk of dramatic scarring is overblown, but in this case there’s also a wider issue with the use of a statistical method which easily lends itself to misleading interpretations – canonical correlation analysis (CCA).
CCA is a method for extracting statistical associations between two sets of variables. Here one set was the 55 brain connectivity measures, and the other was the 4 clinical clusters. Yu et al.’s CCA revealed a single, strong association (or ‘mode of variation’) between the two variable sets:
A correlation coefficient of 0.68 is very large for a study of a brain-behaviour relationship. Normally, this kind of result would certainly justify the term “dramatic association”.
But the result isn’t as impressive as it seems, because it’s a CCA result. CCA is guaranteed to find the best possible correlation between two sets of variables, essentially by combining the variables (via a weighted sum) in whatever way maximizes the correlation coefficient. In other words, it is guaranteed to over-fit and over-estimate the association.
Read full, original post: Scarred Brains or Shiny Statistics: The Perils of CCA