The simplest possible Bayesian model

MathJax.Hub.Config({
extensions: [“tex2jax.js”],
jax: [“input/TeX”, “output/HTML-CSS”],
tex2jax: {
inlineMath: [ [‘$’,’$’], [“\\(“,”\\)”] ],
displayMath: [ [‘$$’,’$$’], [“\\[“,”\\]”] ],
processEscapes: true
},
“HTML-CSS”: { availableFonts: [“TeX”] }
});

In chapter 5, Kruschke goes through the process of using Bayes’ rule for updating belief in the ‘lopsidedness’ of a coin. Instead of using R I decided to try implementing the model in Excel.

The Model

Coin tosses are IID with a \mathrm{Bernoulli}(q) distribution, thus the probability of observing N of which z are heads is

p(z|q, N) = q^z (1-q)^{(N-z)}

Using Bayes’ rule we get

\displaystyle p(q|z,N) = \frac{q^z (1-q)^{N-z} p(q)}{p(z|N)}

Note that if p(q) is already of the form

\displaystyle f(q;a,b) = \frac{q^{a-1} (1-q)^{b-1}}{B(a,b)}

where the denominator is a normalizing constant then p(q|N) = f(q;a+z,b+N-z).

Excel implementation

This is very straight-forward using the BETADIST function, which is cumulative, so we divide the unit interval into equally spaced subintervals, calculate the probability of q being in the interval and update on the basis of the new observations.

Here’s the result.

Image

Doing Bayesian Data Analysis

Assuming I can keep at it, I’ll be making my way through Kruschke’s Doing Bayesian Data Analysis. Here’s a few concepts he goes through in Chapter 4.

The Bayes factor

This is a ratio which allows you to compare which out of two models best fits the data. By introducing a binary parameter which determines the choice of model, Bayes’ rule

{\displaystyle p(M_i|D)=\frac{p(D|M_i)p(M_i)}{p(D)}}

gives us the Bayes factor

\displaystyle \frac{p(M_1|D)}{p(M_2|D)}=\frac{p(D|M_1)}{p(D|M_2)}\frac{p(M_1)}{p(M_2)}

or also

\displaystyle \frac{p(D|M_1)}{p(D|M_2)}=\frac{p(M_1|D)}{p(M_2|D)}\frac{p(M_2)}{p(M_1)}

Note that if we have no prior preference between the two models, these ratios are equal.

The Bayes’ factor is useful because on their own p(D|M_i) scale on the basis of the model – the more complex the model, the lower the absolute value – but this is actually a good thing because by considering the ratio we automatically trade off complexity (when there’s little data) against descriptive value (when the simpler model doesn’t fit the data).

Data order invariance

I don’t get this part – it’s always true that p(\theta|D', D) is invariant of the order in which Bayes’ rule is applied, by definition. The factorized rule cited at the end is just the result of applying the definition of data independence given the parameters.

The problem with Bayesian mathematics

Computing the posterior given some new data usually means performing an integral, which, even using approximations, can be computationally intensive (especially since the posterior will be fed into an optimizer). Numerical integration on the other hand would impose a limit on the dimensionality of the model. Thus we winning method is Markov Chain Monte Carlo (MCMC).