In ecology, co-occurrence networks can help us identify relationships between species using repeated measurements of the species’ presence or absence. When evaluating potential relationships, we might ask: Given presence-absence data, are two species co-occurring at a frequency higher or lower than expected by chance? Somewhat surprisingly, although co-occurrence analysis has been around since the ’70s, there’s no universally agreed upon method for measuring co-occurrence and testing its statistical significance (Veech, 2012). In this post, we’re going to examine the probabilistic model as seen in Veech’s *A probabilistic model for analysing species co‐occurrence** *(2012)*.* We’ll start by defining the model before moving into calculating co-occurrence probabilities in R.

## Defining the probabilistic model of co-occurrence

#### Overview

In order to understand whether two species co-occur at a frequency greater than or less than expected, we first need to know the probability of two species co-occurring at a given number of sites. This will depend on the number of sites sampled (*N*) and the number of sites each species inhabits (*N _{1}* and

*N*). Using this information, we can determine

_{2}*p*, the probability that 2 species co-occur at exactly

_{j}*j*sites, for

*j*= 0…

*N*. To calculate

*p*, we’ll count the number of ways species 1 and 2 can be arranged among

_{j}*N*sites while co-occurring at

*j*sites and divide that by the total number of ways species 1 and 2 can be arranged among

*N*sites (Eq. 1).

#### The math behind *p*_{j}

_{j}

The numerator of *p _{j}* can be calculated by multiplying 1) the number of ways

*j*sites can be arranged among N sites, by 2) the number of ways species 2 can be arranged in the remaining sites that don’t have both species, by 3) the number of ways species 1 can be arranged among sites that don’t have species 2. The denominator can be calculated by multiplying the number of ways species 2 can be arranged by the number of ways species 1 can be arranged (Eq. 2).

There are limitations to the number of sites two species can co-occur at. Let’s say we sample 10 sites. Species 1 is found in 7 sites and species 2 is found in 5 sites. If you were to randomly place species 1 in 7 sites, you’d have 3 sites empty sites leftover. Since species 2 is found in 5 sites, the two species have to co-occur at a minimum of 2 sites. Thus, max{0, *N _{1} + N_{2} – N* } ≤

*j*. Additionally,

*j*can’t exceed the number of sites the species with the lowest presence inhabits. For instance, species 1 and 2 can’t co-occur at 5 sites if species 1 is only present at 2 sites. Therefore, max{0,

*N*} ≤

_{1}+ N_{2}– N*j*≤ min{

*N*}. If

_{1}, N_{2}*j*doesn’t meet these criteria, then

*p*= 0.

_{j}## Calculating *p*_{j} : an example

_{j}

Let’s say we’re interested in the co-occurrence patterns of two different bird species across 4 different sampling sites. Both species 1 and species 2 are present at exactly 2 sites. What’s the probability species 1 and 2 are found together at exactly one site? In other words, what’s *p _{1}*?

#### Breaking down the numerator

We’ll start by looking at the numerator of *p _{j}*. We can see from the Fig. 1 there are 4 different ways the single co-occurrence could be arranged among the 4 sites. For each unique way of placing the co-occurrence, there are three sites where species 1 and 2 don’t co-occur. That means, there are three sites (

*N – j*) where we can arrange species 2. Since species 2 is only found in two sites, we only need to place species 2 in one more site (

*N*). That gives us three different ways of placing species 2 in one of the three remaining sites.

_{2}– jNow, we have two sites leftover that don’t have species 2 (*N – N _{2}*). Again, we only need to place species 1 at one site (

*N*) and there are two ways to place species 1 among two sites. Multiplying these all together, we get 4 * 3 * 2 = 24 ways species 1 and 2 can co-occur at 1 of the 4 sites given they are each found in two sites.

_{1}– j#### Breaking down the denominator

The denominator is a bit more straightforward (Fig. 2). There are six different ways of arranging species 2 across 4 sites (see picture below). Since this is the same for species 1, this gives us 6 * 6 for the denominator. Altogether, *p _{1}* = 24/36 ≈ 0.67.

## Calculating *p*_{1} in R:

_{1}

Once we’ve defined *N, N _{1}, N_{2}* and

*j*, we can use the

**choose()**function to evaluate Eq. 2 in R.

```
# Define the number of sites.
N = 4
# Define the number of sites occupied by species 1.
n1 = 2
# Define the number of sites occupied by species 2.
n2 = 2
# Number of sites species 1 and 2 co-occur at.
j = 1
# Probability that species 1 and 2 occur at exactly 1 site.
choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/
(choose(N, n2) * choose(N, n1))
```

## Using *p*_{j} to assess significance

_{j}

Assessing the statistical significance of an observed co-occurrence relies on the fact that ∑*p _{j}* = 1 for

*j*= max {0,

*N*} to min{

_{1}+ N_{2}– N*N*}. Let’s say

_{1}, N_{2}*Q*represents the observed co-occurrence. To assess whether or not two species co-occur less than expected, we’ll want to know the probability of seeing them co-occur

_{obs}*at least*

*Q*times, ∑

_{obs}*p*for

_{j}*j*= max {0,

*N*} to

_{1}+ N_{2}– N*Q*. If this probability is less than our significance level, say 0.05, then the two species co-occur significantly less than expected by chance.

_{obs}On the other hand, if two species co-occur at a frequency greater than expected, then the probability of seeing them co-occur *Q _{obs}* times or more will be less than the significance level, ∑

*p*for

_{j}*j*=

*Q*to min{

_{obs}*N*}. To find the expected co-occurrence, we can take the weighted sum of each

_{1}, N_{2}*j*with

*p*as the weights. Mathematically, this is ∑(

_{j}*p*×

_{j}*j*) for

*j*= max {0,

*N*} to min{

_{1}+ N_{2}– N*N*}.

_{1}, N_{2}## Assessing species co-occurrence significance: an example

Imagine we’ve sampled 30 sites and found two lizard species co-occur at 6 sites. Species 1 is present at 10 sites and species 2 at 25 sites. Do these species occur more or less frequently than expected by chance?

To answer this question, we can use our code from above with a few modifications:

```
# Define the number of sites.
N = 30
# Define the number of sites occupied by species 1.
n1 = 10
# Define the number of sites occupied by species 2.
n2 = 25
# Number of sites species 1 and 2 co-occur at.
j = max(0, n1 + n2 - N):min(n1, n2)
# Probability that species 1 and 2 occur at exactly j sites.
pj = choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/
(choose(N, n2) * choose(N, n1))
# Show table for j, pj, and the cumulative distribution.
round(data.frame(j, pj, sumPj = cumsum(pj)), 4)
```

The probability of the two lizard species randomly co-occurring at 6 sites or less is 0.0312 (*p _{5} + p_{6}*). Assuming a significance level of 0.05, we can conclude the two lizard species occur less frequently than expected by chance. On the other hand, the probability of the two lizard species co-occurring at 6 sites or more is 0.9982 (

*p*or 1 – p

_{6}+ p_{7}+ p_{8}+ p_{9}+ p_{10}_{5}). Additionally, the expected co-occurrence is 8 sites.

```
# Expected number of co-occurrence.
sum(pj * j)
```

## Using the probabilistic model of co-occurrence in practice

Now that we’ve worked through understanding the probabilistic model of co-occurrence for two species, how can we extend this to multiple pairs of species? Luckily, the R package ‘*cooccur*‘ can do this for us. Check out our blog post to see how to create co-occurrence networks using ‘*cooccur*‘ and ‘*visNetwork*‘. Hopefully, you’ll now have a good understanding of how they calculate probabilities of species co-occurrence and can replicate their results if you desire. As always, happy networking!

### Citation

Veech, J. A. (2012). A probabilistic model for analysing species co-occurrence. Global Ecology and Biogeography, 22(2), 252-260. doi:10.1111/j.1466-8238.2012.00789.x

## Leave a reply, but don't be rude!