Training Systems Using Python Statistical Modeling

上QQ阅读APP看书，第一时间看更新

Bayesian hypothesis testing for proportions

Unlike classical statistics, where we say a hypothesis is either right or wrong, Bayesian statistics holds that every hypothesis is true, with some probability. We don't reject hypotheses, but simply ignore them if they are unlikely to be true. For one sample, computing the probability of a hypothesis can be done by considering what region of possible values of θ correspond to the hypothesis being true, and using the posterior distribution of θ to compute the probability that θ is in that region.

In this case, we need to use what's known as the cumulative distribution function (CDF) of the posterior distribution. This is the probability that a random variable is less than or equal to a quantity, x. So, what we want is the probability that θ is greater than 0.3 when D is given, that is, if we are testing the website administrator's claim that there are at least 30% of visitors to the site clicking on the ad.

So, we will use the CDF function and evaluate it at 0.3. This is going to correspond to the administrator's claim. This will give us the probability that more than 30% of visitors clicked on the ad. The following screenshot shows how we define the CDF function:

What we end up with is a very small probability, therefore, it's likely that the administrator is incorrect.

Now, while there's a small probability, I would like to point out that this is not the same thing as a p value. A p value says something completely different; a p value should not be interpreted as the probability that the null hypothesis is true, whereas, in this case, this can be interpreted as a probability that the hypothesis we asked is true. This is the probability that data is greater than 0.3, given the data that we saw.