Genetic expression in the human brain: The challenge of large numbers

Obama’s recently announced effort to map the human brain and the parallel European efforts (the Human Brain Project), are taking direct aim at neurological disorders and diseases. Schizophrenia, Alzheimer’s disease, Parkinson’s, post-traumatic stress disorders, and epilepsy are all, at their worst, debilitating brain malfunctions; the hope is that new money injected into basic research on brain function will improve treatment and recognition. The ultimate goal may be drug development, drug delivery, novel treatments and even cures. But a first step is carrying out the basic research required to understand what goes on in our brains, perhaps the most complex system known to humankind.

The Magnitude of the Brain Mapping Problem

The first challenge of the human brain lies with the sheer volume of information when it comes to how neurons are connected. The numbers are astounding. The human brain is estimated to have approximately 86 billion neurons (8.6 x 1010), each neuron with possibly tens of thousands of synaptic connections; these little conversation sites are where neurons exchange information. In total, there are likely to be more than a hundred trillion neuronal synapses – so a computer recording a simple binary piece of information about synapses, such as whether it fired in a time window or not, would require 100 terabytes. The amount of storage needed to store even this very simple information every second over the course of one day for one person would more than 100,000 terabytes, or 100 petabytes. Supercomputers these days hold about 10 petabytes. And this quick calculation doesn’t account for the changes in connectivity and positioning of these synapses occurring over time. Counting how these connections change just after a good night’s sleep or a class in mathematics amounts to a whopping figure (and many more bytes than the estimated 1080 atoms in the universe). The wiring problem seems intractable in its magnitude.

There is some good news when it comes to the “wiring” problem. The many constraints on how a brain can wire itself – for example neurons from the hippocampus do not generally connect directly with those in the cerebellum –reduce the possible ways in which a brain can be connected. Brain architecture has many constraints, however, buoying hopes that one day it will spur scientists to map it out in all its complexity. These constraints speak not to the mathematical computation of wiring possibilities but the biology and physics. Neurons have to be compacted tightly to allow for energy minimization and efficiency in communication. The function of specific cell-types constrain with which cells they communicate. Cells have to be nourished, a biological constraint that forces the wiring to allow for blood flow, for example. Even with all these constraints, though, the numbers are shockingly large.

The neural connectivity of a human brain has an intimate relation to its genetic make-up. While all human cells have (approximately) the same DNA, genes are expressed (or not) in different parts of the brain. Even the genetic story is complicated, because gene expression is not well understood or predictable. With the understanding that gene expression is itself linked to the physical networking among neurons, scientists have set out to map out human genetic expression in the brain. But even this genetic problem is intractable with today’s technology if we insist on knowing the gene expression of every cell and for every gene.

To get a sense of the magnitude of the problem of genetic expression in brain regions, let’s compare it to functional magnetic resonance imaging (fMRI). Typical fMRI machines are able to record behavior (in the form of oxygen use) in small brain volumes called voxels about 1 cubic millimeter in size, depending on the strength of the fMRI machine. The size of a human brain is about 1200 cubic centimeters, or 1.2 million cubic millimeters – in other words, more than a million voxels of this size. For males, the average is a little higher and for females the average is a little smaller.

Suppose there existed technology that could tell us whether cells in each voxel were expressing or not a particular gene. With an estimate of 20,000-30,000 human genes (such estimates are hotly debated), that would amount to approximately 36 billion pairs of gene and locations in the brain, which presents a problem that’s much smaller than by cellular wiring. However, such a process is labor intensive itself – and herein lies the next aspect of the large numbers. Even if the amount of data were sufficiently small, the time required to collect the data may be astronomically long. Even if one could (rather absurdly optimistically) record the entire gene expression for each of these voxel in 1 second, it would take about 10 years to collect the data.

And these data would still be considered “low resolution”; they would tell us something about the little brain volumes, not the specific brain cells. There are, on average, almost a million human neurons per 1 cubic mm voxel. And even if we were to increase the resolution a million-fold, assigning a simple 0 or 1 to each voxel and each gene (0 for “not expressed” and 1 for “expressed”) only tells a partial tale; it doesn’t carry the information of how much expression there is, nor whether the protein that the gene codes for is actually produced.

Gene Expression in the Brain

Scientists widely agree that genes control many aspects of our brain function and connectivity. Genetic expression – whether a specific gene is actually being expressed by a cell – may be essential in predicting who will be vulnerable to specific diseases, as well as what sort of drug therapies they may be receptive to. Gene expression may be the most fundamental level at which a genotype gives rise to a phenotype. For example, many people carry the single gene BRCA mutation for various kinds of breast and ovarian cancers, but only a percentage of carriers get cancer. Environment and gene-gene interactions influence if, how and when genes are expressed.

The challenge to map genetic expression in human brains was taken on by The Allen Institute for Brain Science in San Francisco. Scientists there began by industrializing the process, introducing robots working around the clock with systematic precision. In two major pieces of work on the human brain (by Michael Hawrylycz and Ed Lein, et al, and by Hongkui Zeng and Elaine Shen, et al.), the Allen scientists:

  • Analyzed an estimated 30,000 genes in about 1000 different brain regions
  • Analyzed at a cellular level two functionally distinct cortical regions for about 1000 genes.

To do this more grandly for the entire 1,000,000 voxels and 30,000-or-so genes, we need a leap in technology: the order of magnitude is a full 1000 times greater to obtain gene expression in each such region. Considering it took about five years to collect the data on humans after the technique was successfully developed on, and applied to, the mouse, it could take thousands of years to do the same for the entire brain with current techniques. Then there would still be the question of what to do with the data. The desire for more detailed information obtained by using smaller voxels increases the computational needs by additional orders of magnitude. The human brain will require yet another quantum leap.

Identifying Gene Expression

The technique to identify whether specific genes are expressed in specific cells is simple enough. Genes are coded by a sequence of bases in the DNA. Scientists engineer a complementary sequence to these bases and attach markers such as a fluorescent probe, or an enzyme that makes a chemical turn from clear to black. The relevant cell(s) are bathed with this complementary sequence, allowing the sequence to bind to the genes they are looking for. The probes show their colors – in the case of a fluorescent probe with a confocal microscope and in the case of the color-changing chemicals with a wide-field microscope. The relevant cells show that these genes are expressed by lighting up (or blackening) under the microscope.

The problem is that for humans, there is simply too much brain to analyze. With 86 billion neurons and 30,000 genes, there are more than 2,600,000,000,000,000 (2.6 x 1015) gene-cell pairs to check. Even at 1 second per gene-cell pair, it would take more than 100 million (108) years. And that would be without considering glia cells, which make up about half of the cells in the brain and are currently suspected to play a more important role than previously thought. Massive parallel computing has to be an essential part in mapping the brain, and the problem has to be simplified to be ambitious but accomplishable.

The first step was to look at a much smaller animal: the mouse. The mouse’s brain is about one-thousandth the size of a human brain and has fewer than one-hundredth the neurons of a human brain (a mouse has about 300 million neurons, and mouse neurons are smaller than their human counterparts). The mouse brain consequently exhibits far less complexity in each of its regions. It also has fewer genes, about 20,000. Scientists only finished mapping the genetic expression of the mouse brain in 2006.

Mapping the human brain, however, is orders of magnitude more difficult. The question by necessity became how to collect (as efficiently as possible) partial data on the human brain. On the one hand, scientists can look at different brain regions, spliced up in a fairly gross (no pun intended) way, and identify, for each region, the genes that are expressed. We will call this the “genes per region” viewpoint; the corresponding method is microarrays. Conversely, they could identify specific genes and ask which regions are expressing those genes – what we will call the “regions per gene” tactic.

The corresponding method for collecting this data is in situ hybridization. Allen scientists used in situ hybridization to get a rather complete story for the mouse; this was a stunning effort because it involved 20,000 in situ hybridizations rather than just a few (normally, one in situ hybridization could take 2-3 days). Human brains are just too big to collect such rich data; thus was born the idea of using microarrays to complement what in situ hybridization could do. These two techniques form the basis for the Allen strategy toward human brain gene expression – both obtain partial results about how the human brain expresses itself genetically, but together they provide a shimmer of insight into the human brain.

The Mouse and In Situ Hybridization

In Situ Hybridation (ISH) begins with a single complementary strand of RNA or DNA. This genetic code is fed to a targeted cell or tissue; the expression or lack of expression of the corresponding gene is gathered by various (microscopic) techniques. One looks for the expression of an individual, specific gene in many regions.

Until recently, ISH was performed in single cells or in a collection of nearby cells in small regions of the (generally, rodent) brain with at most a few genetic markers. It continues to be used when a specific gene is suspected of playing a role in a specific cell or cellular mechanism, or when a specific gene is suspected of a common mutation. For example, ISH is used to identify some chromosomal abnormalities in fetuses. The snail’s pace progress is exciting but until about ten years ago was far too slow to apply to mapping out human genetic expression.

In 2003 The Allen Institute set about industrializing the process. Scientists close to the process realized that with an investment in robotic processes to collect data, and complex computer programs to analyze it, they could do massive parallel computation and create a map of the mouse brain. It took about two years to set the process up and another year for the data to finally come in. For each of about 20,000 genes and each of about 250 “slices” of mouse brain, gene expression was mapped out. The data were recorded in a searchable database used as a starting point for hundreds of neuroscientists interested in rodent genetics. The method was a steadfast application of finding the “regions per gene” (i.e. documenting which regions were expressing each individual gene or responding to each genetic probe).

One key aspect of the mouse program is that hundreds of mice have been used, rather than just a few. This means that the data are largely generalizable and the variation of gene expression among these laboratory mouse populations has been documented. A paper documenting the mouse findings was published in Nature in 2007.

An Anatomically Comprehensive Atlas: Genes and Architecture are interrelated

While mapping the mouse brain was a huge project, it was only a small stepping stone for mapping human gene expression. The goal of the Allen Institute’s Human Brain Atlas is to provide information on as many genes as possible over as many regions as possible, forming an online, searchable “atlas” of the human brain.

Certainly scientists were interested in using ISH to dissect the story for human brains, but a new technique also came into play: microarrays.

Imagine a mini-bead box, with separators for each bead type, of size 250 x 250. Each such box has over 62,000 separate “wells” in the bead-box, each containing genetic probes looking for a specific gene. Together these genetic code wells form a microchip. Each microchip is then doused with a drop of homogenized brain from each of the separate parts of the brain under consideration.

Each well is a microcosm of activity – provided the piece of brain expresses the gene associated with the relevant well. When the genetic code is linked to a fluorescent probe and light is shined on it, the microarray will light up in the wells contained in this location in the brain. In other words, the genetic expression of the brain region will be recorded in a microchip of 62,000 bits. One microarray is used per brain region, so the entire genetic expression can be mapped out for each region.

The Allen group’s attempt to map out which regions of the brain express each gene is documented in a groundbreaking paper in Nature. The authors use the “genes-per-region” microarray approach, having run approximately 900 regions against a complete list of human genes in a microarray. It was such a labor-intensive task that they could only do it for 2 male adult humans (who had donated their brains to science). Although almost more than 900 regions seems like many, it is actually a coarse way to divide the human brain – to get this coarse information on the whole human brain, we would need to slice up approximately 1,000,000 pieces.

The microarrays also record other data, such as how much of a gene is expressed; the more information they record, of course, the more analysis is required. Unfortunately the microarray data does not directly translate into genetic data; a collection of approximately 62,000 genetic probes is used to obtain data on the roughly 30,000 genes.

Many observations result from a close look at these two brains. The authors plotted gene expression profiles of genes associated with dopamine signaling across 170 brain structures in both brains. They found remarkable similarity between the two brains. They found no significant differences in genetic expression between the left and the right hemisphere. They found that the anatomical structure of the brain (which cells are close to which others) dictated to some extent the genetic structure – close by cells had more similar genetic expression. On the other hand, they also looked at some 740 genes identified in the human excitatory postsynaptic density (PSD), a region of a neuron receptive to a connection. About a third of these genes’ expressions varied significantly among different regions of the brain. The authors surmise that these regional differences in synaptic gene expression may well underlie different functions of different regions.

Yet, despite these advances, a back of the envelope calculation suggests we won’t know what’s going on cell-by-cell for humans for an absurdly long time unless new scientific developments speed the process significantly.

Gene Profiling in Human Neocortex

While the microarray technique has the advantage of revealing all of the gene expression for a small number of homogenized pieces of brain, a parallel effort using in situ hybridization was able to characterize the expression of about 1000 genes important for neural functions at a cellular resolution in specific regions of an adult human brain.

University of Buffalo physicist Hao Zeng recently led a team that looked at two different regions of the cortex: the visual cortex and the midtemporal cortex. The visual cortex has a mouse-analogue, making it a good candidate for comparison with other species, while the midtemporal cortex is functionally distinct from the visual cortex – the midtemporal cortex is responsible for language comprehension as well as memory. The comparison between these two regions may offer some insight into the relationship between gene expression and neural function.

The results painted a picture of our genetic diversity, both among our individual brain cells and between species. There was “remarkable conservation of each individual’s gene’s expression among individuals (95%), cortical areas (84%), and between human and mouse (79%)”. Different genes had different gene expression patterns, suggesting that genes play diverse roles in our neurons. About 15% of genes changed their expression in the different regions of the brain, and only about 15% of genes expressed themselves differently among different individuals (though tissue quality issues may make this figure unreliable).

While the 995 considered genes were certainly not randomly chosen, it is notable that cross-cortical differences are not so much greater than cross-species differences. This conservation perhaps belies the importance of the chosen genes. As the authors comment, “the identification of genes with unique expression patterns that are conserved in mouse and human, especially the cortical cell-type markers, provides a form of validation for the functional relevance of these marker- defined cell types and offers a means to track these cell types and investigate variations in different cortical regions or under various mutant or disease conditions.”

A breakthrough in data sharing: the atlas is freely accessible

Data sharing is rife with contention – bringing up issues involving privacy, politics, scientific integrity, and even individual scientists’ egos. But the Allen Institute’s decision to put its work in an online database is an enormously powerful scientific step As data become more readily accessible and minable, scientists have the opportunity to test hypotheses against real data, with real results coming out of such work. The atlas for the mouse brain has been cited over 900 times according to the Web of Science citation tracking system – and no doubt the results for humans will achieve a similar level of success in influencing scientific research. The point is that any genetic question may well begin by asking where a gene is expressed. An online database is an way for the scientific community to spur scientific growth through open data sharing.

What does it mean for you and me? It’s not entirely clear what the consequences will be from a practical standpoint because it takes so many years and so much effort to develop any specific medical application. The mechanism of genetic expression however is suggestive. Genes code for proteins. If there is no gene expression, there is no hope that the corresponding protein can be produced. If a gene is expressed, it might make the corresponding protein. This is one reason that gene expression is so intimately related to cellular function; proteins are involved with most cellular activity.

Some speculation of what might come is in order. The identification of gene expression suggests drugs could be paired with genetic probes so that they act on brain regions expressing the corresponding genes, perhaps the cells using a specific protein. These probes could ensure that the drug is delivered to the cells in greatest need.

Gene expression is also known to play a role in cell differentiation (i.e. how cells grow to play different roles in our bodies). Gene expression may regulate how brain cells are differentiated as well, in function and form, and could lead to discoveries about the mechanisms behind specific degenerative diseases. Perhaps this can lead to therapies involving biotechnology, again, targeting cells expressing those specific genes.

Finally, open-sourced data on both the mouse and the human allows comparisons across species. If the genetic expression of a specific human gene has a correlate in the mouse, one can do experiments on the mice that one could never do on humans, and perhaps generate many more new ideas for medical therapies than without the animal research. Better yet, we could even engineer mice that exhibit the same genetic expression as humans in a particular region of their brains, allowing experimentation to occur on animals designed to be similar to humans (at least in that targeted brain region).

As we invest in the efforts to combat neurological diseases in an aging population, to identify risks in genetic disposition to psychological disorders or even just to understand the functioning of the human brain, the sharing of such data will lead to unexpected discoveries, for which the Brain Atlases will have been undeniably of fundamental importance.

Rebecca Goldin is Research Director for the Genetic Literacy Project, Director of Research for the Statistical Assessment Service (STATS) and Professor of Mathematical Sciences at George Mason University. Dr. Goldin was supported in part by National Science Foundation Grant #202726.

Additional Resources:

 

News on human & agricultural genetics and biotechnology delivered to your inbox.
glp menu logo outlined

Newsletter Subscription

Optional. Mail on special occasions.