The Search for the Null

by Alex Taylor

Schools of brightly colored fish swirl around you as you set up your video monitoring equipment. As a marine ecologist, you’re hoping to get a reasonable census of the relative abundance of different species in this coral reef. With this data, you can begin to ask what might be driving these patterns. Why are parrotfish so common, and triggerfish so rare?

Some possible explanations might include competition between species, or the rate that one species branches into two. Some might include multiple reefs, with fish migrating between them, whereas others might treat each reef as its own isolated island. These explanations are all formalized into mathematical “models” that predict what the distribution of species should look like. When you compare the distribution of species you collected with these model predictions, which matches most closely? Which does the best job of explaining your reef?

Compared to what?

The problem is that data you collect at the reef are going to be very messy, and no model will predict your data exactly. Often, a number of different explanations (models) could all account for what you see, more or less. You need to have an idea of what you’d expect to see if you’re right, so you can tell if the data roughly bears out your predictions. But just as importantly, you must know what to expect if you’re wrong – that is, you need a “Null Model.” A null model allows you to tell if your explanation does better than random chance in explaining your chaotic, ill-fitting data.

Source: XKCD

The problem of how to choose a null hypothesis or model is widespread across science. Ideally, the null hypothesis should just be that the effect you hypothesized doesn’t exist, and that your data is explained by random chance, since this is the simplest explanation. However, your null hypothesis must speak the same language as the system you study. The distribution of numbers you get from rolling dice is random and simple, but it is also entirely irrelevant to how species occupy a coral reef. The null hypothesis needs to be realistic enough that rejecting it is meaningful.

Ideally, you can set up a control group that is identical except for the factor that you want to test. If you want to test whether a fertilizer would make a tulip grow taller, you plant genetically identical tulips in identical soil. You give some (the test group) the fertilizer, and refrain from fertilizing the others (the control group). Your test hypothesis is that the fertilized tulips will be significantly taller, and your null hypothesis is that any differences in height are purely due to chance – that there is no effect of fertilizer.

Models for the Uncontrollable

This is all well and good, but what happens when you try to explain patterns in something large and chaotic, like a city, the internet, a rainforest or a coral reef? Beyond the problem of not having any measurable control group, what would you even expect if your hypothesized effect doesn’t exist, and your data is explainable by chance? For example, say you think that feeding competition is a driving factor in determining which species are the most common on your reef. What exactly should your data look like if that’s not true?

Dr. Stephen Hubbell’s Neutral Theory of Biodiversity (NTB), also called Ecological Neutral Theory, attempts to provide a very simplified model of how species come to occupy space in an ecosystem. There are now many variations of NTB, but at the core there are two basic assumptions of the theory. The first (the “zero-sum assumption”) is that a given habitat only has room (or sunlight, or water, or nutrients) for a certain number of individuals. The second (the “equivalency assumption”) is that all species have an equal chance of taking one of those available spots (they are “ecologically equivalent”), and of giving birth to a new species. Most everything else in the model arises from random chance.

Clearly, these assumptions are not realistic, as everyone agrees, and NTB has been the subject of much heated argument amongst ecologists. But NTB does what null hypotheses are supposed to do pretty well. It is simple – to many ecologists dealing with real ecosystems, painfully so. And it does a reasonable job of matching real data; the species distributions produced by NTB models match real species distributions pretty closely. NTB uses a process that everyone agrees is not realistic to produce a pattern that most would agree is “close enough” to reality.

Why have a null model at all? Why not just compare leading models and see which model fits the data best? Though some advocate for this approach, others argue that the null model has an advantage in that comparing data to an intentionally simple model forces us to justify each factor we include. Adding a factor like feeding competition to your model must explain the species distribution any better than random chance.

As more and more fields try to account for complex systems through mathematical models, the arguments over neutral and null models are likely to only heat up. If Starbucks wants to build a mathematical model to determine where the next up-and-coming neighborhood is, they will want to know if adding a factor for proximity to a bus station predicts the neighborhood better than chance.

To build or not to build? Photo: Wikimedia user Hintha

Although the arguments back and forth about null hypotheses can seem very pedantic, they are critical to conducting scientific studies properly. It may seem strange to spend so much time developing explanations that we know are not really reflective of reality. But because data about the world is destined to be messy and will never quite fit predictions, scientists need a yardstick for when their hypotheses are good enough. NTB is a straw man of an argument, but if your explanation can’t beat a straw man, then what good is it?

References:

Hubbell, Stephen P. “Neutral theory and the evolution of ecological equivalence.” Ecology 87.6 (2006): 1387-1398. DOI

Rosindell, James, et al. “The case for ecological neutral theory.” Trends in ecology & evolution 27.4 (2012): 203-208. DOI

Hubbell, Stephen P. “Neutral theory in community ecology and the hypothesis of functional equivalence.” Functional ecology 19.1 (2005): 166-172. DOI

Gotelli, Nicholas J., and Brian J. McGill. “Null versus neutral models: what’s the difference?” Ecography 29.5 (2006): 793-800. link

Anderson, David R., Kenneth P. Burnham, and William L. Thompson. “Null hypothesis testing: problems, prevalence, and an alternative.” The journal of wildlife management (2000): 912-923. link

Volkov, Igor, et al. “Neutral theory and relative species abundance in ecology.”Nature 424.6952 (2003): 1035-1037. DOI

Connolly, Sean R., et al. “Commonness and rarity in the marine biosphere.”Proceedings of the National Academy of Sciences 111.23 (2014): 8524-8529. DOI

Mallet, Delphine, and Dominique Pelletier. “Underwater video techniques for observing coastal marine biodiversity: A review of sixty years of publications (1952–2012).” Fisheries Research 154 (2014): 44-62. DOI