Category: Statistics

  • Sequential Probability Ratio Test Part 2

    How to prove that the stopping time is finite In the first post in this series we introduced the concept of sequential hypothesis testing and then gave an introduction to the simplest sequential test, the Sequential Probability Ratio Test (SPRT). That post was intended to provide the basic information that you would need to start…

  • The Sequential Probability Ratio Test

    This will be the first post in a series of posts on the topic of sequential hypothesis testing. Specifically, these posts will focus on the Sequential Probability Ratio Test (SPRT), which is one of the simplest and most well-known examples of a sequential test. When conducting a standard statistical test, we first need to decide…

  • The Markov chain approach to CUSUM

    Note: the code used to do the calculations in this post can be found here in the “change-point-detection” repository on my GitHub page. In the last post we introduced the problem of online change point detection and the CUSUM method for solving that problem. In this post we’ll dive deeper into the math you need…

  • Online change point detection and CUSUM

    Note: the code used to generate the figures in this post can be found here in the “change-point-detection” repository on my GitHub page. In this post we’ll start to look at change point detection, which is the problem of detecting a sudden change in a parameter that characterizes some ongoing process. There are actually two…

  • Multiple hypothesis testing part 3: how to prove that the Benjamini-Hochberg method works

    In our last post we introduced the approach to the Multiple Comparisons Problem based on control of the false discovery rate (FDR). As we discussed there, the FDR is, roughly speaking, equal to the proportion of false positives among the null hypotheses that we decide to reject when testing multiple hypotheses at the same time.…

  • Multiple hypothesis testing part 2: the false discovery rate and the Benjamini-Hochberg method

    In the first post on this blog we introduced the Multiple Comparisons Problem (MCP), which is the increased risk of false positives (a.k.a. type 1 errors) that we face when testing multiple hypotheses at the same time. In that post we mainly discussed the idea of solving this problem by controlling the family-wise error rate…

  • Using data to bound the probability of a rare event

    Note: the code used to generate the numbers and figures in this post can be found here in the “rare-event-probability” repository on my GitHub page. In this post we’ll continue exploring the theme of the last post, which was about the probability that a new data point will differ significantly from a set of data…

  • Chebyshev’s inequality with sample mean and sample variance

    In this post we’ll look at a very interesting fundamental result in statistics that deals with the following situation. Suppose we are studying a system that is producing data according to some unknown process, and we have already observed \(n\) data points. We are about to observe the next data point, and we’d like to…

  • A/B testing with small samples

    Note: the code used to generate the figures in this post can be found here in the “AB-testing-small-samples” repository on my GitHub page. In this post we look at the problem of A/B testing with small sample sizes. This is a tricky situation for several reasons. First, the statistical test that is commonly used to…

  • Multiple hypothesis testing and the Holm-Bonferroni method

    When testing multiple hypotheses at the same time, we must be careful when analyzing the results. A naive analysis of the data will greatly increase our risk of making a type 1 error (i.e., reaching a false positive conclusion on one of the hypotheses). This issue is known as the Multiple Comparisons Problem (MCP) and…