Bayesian Analysis

Last updated on Jul 31, 2022 2 min read Statistics, R

Bayesian approach becomes more and more popular because of the improvement of the mordern computing ability for machine learning and big data. Bayesian analysis is a completely different approach compared to frequentist approach. Yet, it’s more challenging. To be able to understand and learn the Bayesian approach, first, we will need to have the knowledge of conditional probability. Second, to have the knowledge of different distributions such as normal, bernoulli, binomial, gamma, beta, cauchy, possion, etc. Third, to be familiar with calculus. We need to calculate derivatives and integration of different distributions. Last but not at least, to be familiar with simulation sampling techniques such as Grid sampling, Variational Bayes and Monte Carlo Markov Chian (MCMC) including popular Gibbs sampling, Metropolis Hastings Sampling, Hamiltonian Monte Carlo (HMC),etc. Luckily, there are a few software tools Jags, Stan .etc can be used for Gibbs sampling and HMC. ‘Credit: https://www2.isye.gatech.edu/~brani/isyebayes/jokes.html’

The concept is simple. According to Bayes Theorem $p (θ | y) = \frac{p (y | θ) p (θ)}{p (y)}$ or without normalization $p (θ | y) \sim p (y | θ) p (θ)$ , we want to gain the posterior distribution by using prior information and likelihood of the data information. Most of the time, we use log likelihood instead for easier calculation since $l o g (a * b) = l o g a + l o g b$ . Bayesian approach involves much more math than frequentist approach and more complicated. Frequentist focus on one point estimate (hypothesis and Confidence Interval), and bayesian focuses on a looking forward range (Credible Interval). Credit: https://365datascience.com/bayesian-vs-frequentist-approach/

If friquentist and bayesian approach are just two different ways to look into things, Why and when is recommended to use Bayesian approach instead of frequentist approach?

Here are a few examples to use Bayesian approach over frequentist approach:

Clear prior information: the example in the book BD3 of Andrew Gelman’s in Chapter 1, problem 6. The prior information is that approximately 1/125 of all births are fraternal twins and 1/300 of births are identical twins.
Seperation problems. For example logistic regression couldn’t converge due to high dimension and small sample size.
Estimate multiple outcomes with credible interval. For example, family doctors try to diagnosis diseases (such as cold, flu) based on multiple symptoms (such as headache, sore throat, high temprature) and the probabilities of all symptoms sum up to limited possible diseases.
Small sample size with multiple experiments and limited budget.It’s a preferred meta analysis than tranditional meta analysis that has high heterogeneity with different recources since it provides a credible interval instead of confident interval.

Reference:

Bayesian Statistical Modeling Machine Learning MCMC

Bayesian Analysis

Yuan Du

Senior Data Scientist