Multiple hypothesis testing part 3: how to prove that the Benjamini-Hochberg method works

In our last post we introduced the approach to the Multiple Comparisons Problem based on control of the false discovery rate (FDR). As we discussed there, the FDR is, roughly speaking, equal to the proportion of false positives among the null hypotheses that we decide to reject when testing multiple hypotheses at the same time. In that post we introduced two different methods for FDR control. The first was the Benjamini-Hochberg¹ method, which holds for independent hypotheses and for hypotheses with a certain kind of positive dependence/correlation with each other. The second was the Benjamini-Yekutieli² method, which holds for any form of dependence among the hypotheses.

In our last post we introduced these methods and explained how to use them (we also provided Python code that implements both methods). In this post we present full proofs that these methods control the FDR under the stated assumptions. In particular, we prove that the Benjamini-Hochberg method works for independent hypotheses (for the proof in the case of the positive dependence/correlation, see the paper by Benjamini and Yekutieli that we cite in the footnotes below). We also prove that the Benjamini-Yekutieli method works for any form of dependence among the hypotheses.

This is a more technical post, but we hope that this material will be helpful to readers who want to understand how these methods work but may have struggled to work all the way through the proofs in the original papers. The proofs that we present here mostly follow the proofs in the paper by Benjamini and Yekutieli, but we include more detail and use a slightly different notation that will hopefully make the proofs easier to follow.

Review of notation and setup

We’ll now do a quick review of the notation we’ve been using for multiple hypothesis testing scenarios. We consider a situation where we are testing \(m > 1\) null hypotheses \(H_1,\dots,H_m\). We test each hypothesis \(H_i\) (with \(i\in\{1,\dots,m\}\)) by measuring the value of a test statistic \(X_i\), which we will assume to be a continuous random variable taking values in the real numbers. Based on the measured values of the test statistics \(X_1,\dots,X_m\), we then compute p-values \(p_1,\dots,p_m\) for all of the hypotheses. These p-values will then be used, together with our chosen testing method, to decide which null hypotheses to reject (i.e., which results to declare significant).

We assume that a certain number \(m_0 \geq 1\) of the null hypotheses are actually true. Of course, we don’t know which specific hypotheses these are (if we did, then we wouldn’t need to perform the experiment). We’ll use \(\mathcal{J}_0\) to denote the (unknown) set of indices \(j\) that label the true null hypotheses \(H_j\), and we have \(|\mathcal{J}_0| = m_0\).

The false discovery rate (FDR) is the expectation value of a random variable \(Q\) that is defined as follows. Let \(R\) be the number of null hypotheses that we decide to reject when carrying out the experiment, and let \(V\) be the number of those rejected hypotheses that are actually true. If \(R = 0\) then we define \(Q = 0\), otherwise we define

\[Q = \frac{V}{R}\ ,\]

and then the FDR is given by

\[\text{FDR} = E[Q]\ .\]

We can see that \(Q\) measures the proportion of rejected hypotheses, or “discoveries”, that are actually false.

Both the Benjamini-Hochberg and Benjamini-Yekutieli methods guarantee that the FDR will be less than or equal to a significance level \(\alpha\) that is specified by the user. In fact, both of these methods guarantee that

\[ \text{FDR} \leq \frac{m_0}{m}\alpha\ , \tag{1}\]

so the true significance level is smaller than \(\alpha\) by the (unknown) factor of \(m_0/m\).

For the purposes of this post we will assume that all of our hypotheses are tested with a one-sided test, and that the p-value for each hypothesis is equal to the probability, under the null hypothesis, of observing a value of the test statistic \(X_i\) that is greater than or equal to the value we observe in the experiment. More concretely, if \(x_i\) is the value of \(X_i\) that we observe in the experiment, then the p-value \(p_i\) that we compute for the hypothesis \(H_i\) is given by

\[p_i = P_0(X_i \geq x_i)\ ,\]

where the notation \(P_0(\cdots)\) with the “0” subscript refers to probabilities computed under the null hypothesis. The generalization of the proofs below to two-sided tests is not a major difficulty, but the assumption of one-sided tests is convenient because allows us to keep the notation in the proofs from getting too complicated.

The hypotheses \(H_i\) are independent if the test statistics \(X_i\) are independent random variables, otherwise they are dependent. As we discussed in the previous post, the Benjamini-Hochberg method holds for independent hypothesis and under a certain kind of positive correlation between the hypotheses. The Benjamini-Yekutieli method, on the other hand, holds for any kind of dependence among the hypotheses.

Finally, we denote by \(p_{(i)}\) the p-values sorted into ascending order, so that \(p_{(1)}\leq\cdots\leq p_{(m)}\). We also use \(H_{(i)}\) to denote the hypothesis whose p-value is \(p_{(i)}\).

Proof for the Benjamini-Hochberg method

We start by reminding the reader that the Benjamini-Hochberg (BH) method works as follows.

Let \(i_{*}\) be the largest \(i\) that satisfies \(p_{(i)} \leq \frac{i}{m}\alpha\). If such an \(i_{*}\) exists, then reject all hypotheses \(H_{(k)}\) with \(k\leq i_{*}\).
If there is no \(i\) that satisfies \(p_{(i)} \leq \frac{i}{m}\alpha\), then do not reject any hypotheses.

We now prove that this method controls the FDR as in Eq. (1) when the hypotheses are independent of each other. To start, let \(A_{r,v}\) be the event that, in carrying out the BH method, we reject \(r\) total hypotheses, \(v\) of which are true. Then, by definition of the FDR, we have

\[\text{FDR} = \sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}\frac{v}{r}P(A_{r,v})\ .\tag{2}\]

Now let \(R_j\), for any \(j\in\mathcal{J}_0\), be the event that we reject the true null hypothesis \(H_j\). We will need to use the following identity, which we prove in the next section,

\[P(A_{r,v}) = \frac{1}{v}\sum_{j\in\mathcal{J}_0}P(R_j\cap A_{r,v})\ . \tag{3}\]

If we plug this into Eq. (2) then we find that

\[\begin{align} \text{FDR} &= \sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}\frac{1}{r}\sum_{j\in\mathcal{J}_0}P(R_j\cap A_{r,v}) \\ &= \sum_{j\in\mathcal{J}_0}\sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}\frac{1}{r}P(R_j\cap A_{r,v})\ .\end{align}\]

To proceed, it will be useful to define critical values \(c_{i,i’}\) for each test statistic \(X_i\) by the relation

\[P_0(X_i \geq c_{i,i’}) = \frac{i’}{m}\alpha\ .\]

We now write \(P(R_j\cap A_{r,v}) = P(R_j|A_{r,v})P(A_{r,v})\) and consider the conditional probability \(P(R_j|A_{r,v})\). Note that the BH procedure has the property that, if \(r\) hypotheses are rejected, then the p-values for all rejected hypotheses satisfy \(p_i \leq r \alpha/m\), while the p-values for all non-rejected hypotheses have \(p_i > r \alpha /m\).

Now \(P(R_j|A_{r,v})\) is the probability that we reject the true null hypothesis \(H_j\) given that we have rejected a total of \(r\) hypotheses, \(v\) of which are true. If we use our assumption that all of the hypotheses are independent of each other, and then use the property of the BH method that we reviewed in the last paragraph, then we find that this probability is simply equal to the probability that \(X_j\) is greater than or equal to the critical value \(c_{j,r}\),

\[P(R_j|A_{r,v}) = P_0(X_j \geq c_{j,r}) = \frac{r}{m}\alpha\ .\]

In other words, it is equal to the probability that the p-value for \(H_j\) winds up being less than or equal to \(r\alpha/m\).

Using this then leads to

\[\text{FDR} = \frac{\alpha}{m}\sum_{j\in\mathcal{J}_0}\sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}P(A_{r,v})\ .\]

Then, since

\[\sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}P(A_{r,v}) \leq 1\]

and \(\sum_{j\in\mathcal{J}_0}1 = m_0\), we immediately find that the inequality (1) holds for the BH method applied to independent hypotheses.

Proof of Equation (3)

We’ll now present the proof of Eq. (3), which was the key to our proof of the BH method in the previous section. For any subset \(\omega \subset \mathcal{J}_0\) of the true null hypotheses, let \(A_{r,\omega}\) be the event that we reject \(r\) null hypotheses, and where these rejected hypotheses consist of the true hypotheses in the set \(\omega\) plus an additional \(r-|\omega|\) false hypotheses. Then we have

\[\sum_{j\in\mathcal{J}_0}P(R_j\cap A_{r,v}) = \sum_{j\in\mathcal{J}_0}\sum_{\omega \subset \mathcal{J}_0\\|\omega| = v}P(R_j\cap A_{r,\omega})\ .\]

Now we can write \(P(R_j\cap A_{r,\omega}) = P(R_j|A_{r,\omega})P(A_{r,\omega})\), and we have

\[P(R_j|A_{r,\omega}) = \mathbb{I}[j\in\omega]\ ,\]

where \(\mathbb{I}[\cdots]\) is the indicator function of the event in square brackets. In other words, \(P(R_j|A_{r,\omega})\) is equal to one if \(j\) is in \(\omega\) and zero otherwise. Then we have

\[\begin{align}\sum_{j\in\mathcal{J}_0}P(R_j\cap A_{r,v}) &= \sum_{j\in\mathcal{J}_0}\sum_{\omega \subset \mathcal{J}_0\\|\omega| = v}\mathbb{I}[j\in\omega]P(A_{r,\omega}) \\ &= \sum_{\omega \subset \mathcal{J}_0\\|\omega| = v}P(A_{r,\omega})\sum_{j\in\mathcal{J}_0}\mathbb{I}[j\in\omega] \\ &= v \sum_{\omega \subset \mathcal{J}_0\\|\omega| = v}P(A_{r,\omega}) \\ &= vP(A_{r,v})\ , \end{align} \]

which proves Eq. (3).

Proof for the Benjamini-Yekutieli method

Finally, in this section we present the more difficult proof for the Benjamini-Yekutieli (BY) method. We first review the definition of the BY method, and for that we will need the definition of the \(m\)th harmonic number \(h_m\),

\[h_m = \sum_{i=1}^m\frac{1}{i}\ .\]

With this definition, the BY method works as follows.

Let \(i_{*}\) be the largest \(i\) that satisfies \(p_{(i)} \leq \frac{i}{m\cdot h_m}\alpha\). If such an \(i_{*}\) exists, then reject all hypotheses \(H_{(k)}\) for \(k\leq i_{*}\).
If there is no \(i\) that satisfies \(p_{(i)} \leq \frac{i}{m\cdot h_m}\alpha\), then do not reject any hypotheses.

We’ll now prove that this method controls the FDR as shown in Eq. (1) for any form of dependence among the hypotheses. We will again use the concrete expression for the FDR from Eq. (2), where now \(A_{r,v}\) is the event that, in carrying out the BY method, we reject \(r\) total hypotheses, \(v\) of which are true.

To start, we’ll define new critical values \(d_{i,i’}\) for each test statistic \(X_i\) by the relation

\[P_0(X_i \geq d_{i,i’}) = \frac{i’}{m\cdot h_m}\alpha\ .\]

It will also be convenient to define \(d_{i,0} = \infty\), and then \(P_0(X_i \geq d_{i,0}) = 0\) in that case (this is also consistent with the equation above when choosing \(i’ = 0\)). Note also that \(d_{i,i_1} < d_{i,i_2}\) if \(i_1 > i_2\).

For this proof we will use the following identity,

\[P(A_{r,v}) = \frac{1}{v}\sum_{j\in\mathcal{J}_0}P(\{X_j \geq d_{j,r}\}\cap A_{r,v})\ . \tag{4}\]

We can see that this identity is very similar to Eq. (3) except that the event \(R_j\) has been replaced with the event \(X_j \geq d_{j,r}\).

To prove Eq. (4) we follow the same steps that we used to prove Eq. (3) in the previous section. When we come to the point where we need to evaluate the conditional probability \(P(X_j \geq d_{j,r}|A_{r,\omega})\), we again find that it is equal to \(\mathbb{I}[j\in\omega]\) by virtue of the fact that, if we reject \(r\) null hypotheses using the BY method, then all rejected hypotheses have \(p_i \leq r\alpha/(m\cdot h_m)\), while all non-rejected hypotheses have \(p_i > r\alpha/(m\cdot h_m)\).

Using Eq. (4), and interchanging the order of the sums over \(r,v\), and \(j\) as before, we find that

\[\text{FDR} = \sum_{j\in\mathcal{J}_0}\sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}\frac{1}{r}P(\{X_j \geq d_{j,r}\}\cap A_{r,v})\ .\]

Next, we consider the more refined set of events of the form

\[d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\]

for \(\ell \in \{1,\dots,r\}\). These events are disjoint and have the property that

\[\bigcup_{\ell =1}^r \{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\} = \{X_j \geq d_{j,r}\}\ ,\]

and so we have

\[P(\{X_j \geq d_{j,r}\}\cap A_{r,v}) = \sum_{\ell = 1}^rP(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_{r,v})\ .\]

Using this identity, we then find that

\[\text{FDR} = \sum_{j\in\mathcal{J}_0}\sum_{r = 1}^m\sum_{v = 1}^{\min(r,m_0)}\frac{1}{r}\sum_{\ell = 1}^r P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_{r,v})\ .\]

At this point it is convenient to define

\[A_r = \bigcup_{v=1}^{\min(r,m_0)}A_{r,v}\]

to be the event that we reject \(r\) hypotheses in total when carrying out the BY method. If we now bring the sum over \(v\) in past the sum over \(\ell\), and use the fact that \(A_{r,v}\cap A_{r,v’} = \emptyset\) (the empty set) if \(v\neq v’\), we find that

\[\sum_{v = 1}^{\min(r,m_0)}P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_{r,v}) = P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_r)\ .\]

At this point we have

\[\text{FDR} = \sum_{j\in\mathcal{J}_0}\sum_{r = 1}^m\sum_{\ell = 1}^r \frac{1}{r} P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_r)\ .\]

The next step is to note that \(1/r \leq 1/\ell\) (since \(\ell\) is always less than or equal to \(r\) over its range of summation), and that the double sum over \(r\) and \(\ell\) can be rewritten using this equivalence:

\[\sum_{r=1}^m \sum_{\ell = 1}^r \leftrightarrow \sum_{\ell=1}^m \sum_{r = \ell}^m\ .\]

Then we have

\[\text{FDR} \leq \sum_{j\in\mathcal{J}_0}\sum_{\ell=1}^m \frac{1}{\ell}\sum_{r = \ell}^m P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_r)\ .\]

If we now focus on the inner sum over \(r\), we can bound this sum from above by extending the range of summation all the way from \(0\) to \(m\). Then, since the full set of events \(A_r\), for \(r\in\{0,1,\dots,m\}\), has the property that exactly one of these events must always occur, the law of total probability gives

\[\begin{align}\sum_{r = 0}^m P(\{d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}\}\cap A_r) &=P(d_{j,r-\ell+1} \leq X_j < d_{j,r-\ell}) \\ &= \frac{(r-\ell +1)}{m\cdot h_m}\alpha \ -\ \frac{(r-\ell)}{m\cdot h_m}\alpha \\ &= \frac{\alpha}{m\cdot h_m}\ .\end{align}\]

We now have

\[\text{FDR} \leq \sum_{j\in\mathcal{J}_0}\sum_{\ell=1}^m \frac{1}{\ell}\frac{\alpha}{m\cdot h_m}\ .\]

Then, since \(\sum_{\ell=1}^m 1/\ell = h_m\) and \(\sum_{j\in\mathcal{J}_0} 1 = m_0\), we finally conclude that the FDR satisfies the inequality (1) when the BY method is used, and that this holds for any pattern of dependence among the hypotheses.

Footnotes

Yoav Benjamini and Yosef Hochberg. Journal of the Royal Statistical Society. Series B (Methodological), 57:1, 289–300 (1995). https://www.jstor.org/stable/2346101
Yoav Benjamini and Daniel Yekutieli. Annals of Statistics. 29:4, 1165–1188 (2001). https://doi.org/10.1214/aos/1013699998