Does success in college involve having the right genes?

CREDIT: Flickr/CarbonNYC

Genetic mining is a huge scientific endeavor, involving complicated statistics and potentially life-saving or life-altering rewards. Unfortunately, this complexity means misinterpretation is a constant risk, one the American Academy for the Advancement of Science’s Science Magazine fell prey to with a recent online article claiming to identify genetic markers linked to education attainment.

The most recent study found the link could predict as much as 2% of the variation in educational attainment, and there was no evidence that the genes caused the slightly better educational outcomes. In the context of the current discussion of the role of government in regulating the financial elements American education, the potential influence of genetics on educational achievement is more relevant than ever. Students are graduating college with increasing debt and facing long time-lines for repaying it. Some for-profit colleges make an industry of making promises they cannot keep about the value of their certificates or credits, or enrolling poorly prepared students only to laden them with debt as they fail out. How does a student’s genetic makeup influence how they perform in this navigating this financial minefield? Though many media sources are implying that there is a strong genetic component of academic success—last year an article in Medical News Today declared quite plainly “Academic Success Could Be Determined By Genetics”—the evidence is actually quite thin.

The most recent study collected data on educational achievement and genetic variation for more than 125,000 people and found that three particular SNPs (Single nucleotide polymorphisms), or changes to one tiny piece of our genetic code, correlate with increased educational achievement. We each have ten million (10,000,000) of these variations in DNA. Though a few have been linked to health outcomes, such as an individual’s risk of specific diseases or someone’s response to a therapeutic drug, most have no known significance. Most occur on the chains of DNA occurring between genes (rather than in genes), meaning that these genetic markers are not part of the genetic material known to translate into proteins, but are still part of our genetic code.

Despite this, media sources have used this study to leap to stunning conclusions about how closely our genes are linked to our educational success. TheHuffington Post led with Educational Achievement Could Be Linked to Three Genes, Study Finds.Science Daily announced Genetic Variants Linked to Educational Attainment. With such an emphasis on the genetic correlate (the “link”), a typical reader may well conclude that genetics gives someone an advantage—but even a strong link may not be causal. It is widely known that educational success correlates quite highly with income and other social factors, but only The Chronicle of Higher Education responded that a college degree is not to be earned through genetic predisposition. .

The Science study did find that the existence of specific SNPs correlated with more educational success. The question is how strong this link is, and what are the practical implications. The SNP with the highest correlation with years of education resulted in an impressively weak statistic: the coefficient of determination (or R-squared) had a value of R2 = .022%. A statistician would say that means that this SNP can “explain .022% of the variance of the data” on educational years. That’s one forty-fifth of one percent—a trivial impact—even if we assume that this variation caused the educational difference. All three of their strongest SNP-education links together managed to predict, at maximum, 2% of the variation in educational achievement.

But what does “explaining the variance” mean? The variance of educational achievement is a measurement of how spread out the data is from the average—in other words, how much does people’s year of education differ from the expected number of years. The authors are saying that variation in our genetic codes is related to variance of our educational achievement.

But a statistical “explanation” is not the sort of explanation we mean when we talk in layman’s terms. The result is not causal. Even if genetic markers explain (in statistical terms) 80% of educational variance, we would not be able to conclude that the genes cause the difference. It means that if you assume the genetic difference cause differences in educational outcome, then .02% is the impact of the strongest gene change. R2 is also referred to as the fraction of the variance explained by the model. Again, the implication is that if the model is correct, than you could predict about .02% of the educational variance based on the existence of the most highly significant SNP. All three together, under the assumptions of the model, could predict 2% of the educational variance.

Here’s a hypothetical example to illustrate the point why even a high R2 value does not mean there is a causal relationship. Suppose that eating a high sugar diet makes people gain weight and also improves their concentration. If you collect data on these people but do not ask about their food intake, you might conclude that being overweight is highly correlated with excellent concentration. Suppose you do an analysis and find R=80.0%. That means that 80% of the variance of concentration is explained (in the statistical sense) by variance in weight. But the explanation is not the same as the reason (in the sense of what causes what), which we assumed for this hypothetical was actually the sugar diet.

Similarly, the Science authors are not able to prove that these genetic variations are the reason behind the academic success – just that genetic factor can “explain” a small part of the variation in educational outcomes. This means essentially that they track well together, not that one causes the other. Because there are such myriad factors in academic success, it’s no wonder that there are some genetic correlates—after all, educated parents put significant resources (not just financial but also intellectual resources) into educating their children, who are (usually) biologically related.

And as for that measly 2%, you might ask whether it would actually make a difference in real terms. The effect, even if it is real, is vanishingly small—we would be better served simply educating people.

Rebecca Goldin is Research Director for the Genetic Literacy Project, Director of Research for the Statistical Assessment Service (STATS), and Professor of Mathematical Sciences at George Mason University. She holds a Ph.D. in mathematics from the Massachusetts Institute of Technology in Mathematics, and a B.A., cum laude from Harvard University. Dr. Goldin was supported in part by National Science Foundation Grant #202726.

  • Mike Lawrence

    Please replace all cases of “.022%” to either “.022” or “2.2%”. (Ditto “.02%”) While a small typo, it represents an error of two orders of magnitude and should be fixed.

  • Rebecca Goldin

    The R^2 values are correct (the orders of magnitude look initially surprising). You are correct that the R^2 was .000222 for the most predictive individual SNP. However, all measured SNPs together, not just the top 3, resulted in an explanation of about 2% of the data. The authors did use a linear model to obtain that polygenetic effect, and described it in a supplementary material on Science’s website. Thanks for catching the three SNPs problem.