Introduction to Bayesian Linear Regression, Posterior Inference, and MCMC
Peter Sørensen
Center for Quantitative Genetics and Genomics
Aarhus University
Overview
Classical Linear Regression
Model, inference, and limitations
Bayesian Linear Regression
Motivation, priors, and posteriors
Conditional posteriors and inference
Computation and Applications
MCMC and Gibbs sampling
Diagnostics and R implementation
Introduction
Bayesian Linear Regression (BLR) extends classical regression by incorporating prior information and producing posterior distributions over model parameters.
Advantages:
Handles high-dimensional and small-sample problems.
Provides full uncertainty quantification.
Enables regularization and integration of prior biological knowledge.
Applications in Genomics
Bayesian Linear Regression (BLR) is widely applied in quantitative genetics and genomics.
Common use cases:
Genome-Wide Association Studies (GWAS) and fine-mapping of causal variants.
Inference via standard errors and \(t\)-tests, confidence intervals, and prediction intervals.
Limitations
No explicit control over effect size distribution
Sensitive when collinearity is high
Not identifiable when \(p>n\)
Uncertainty largely asymptotic unless normality assumptions hold
Why Bayesian Linear Regression?
Combines likelihood and prior to form the posterior.
Priors express beliefs about effect sizes:
Normal → many small effects
Spike-and-slab → sparse effects
Acts as a regularizer:
Shrinks small/noisy effects toward \(0\)
Preserves large, important effects
Stable when \(p > n\) due to prior information.
Provides full posterior distributions for \(\beta\) and \(\sigma^2\).
Overview: Bayesian Linear Regression
Combines data and prior knowledge using Bayes’ rule.
Uses conjugate priors to yield closed-form full conditionals.
Employs Gibbs sampling to approximate the posterior distribution.
Estimates parameters, uncertainty, and predictions from posterior draws.
Bayesian Linear Regression with Gaussian Priors
Bayesian linear regression starts with the same model structure as classical linear regression.
\[
y = X\beta + e, \quad e \sim \mathcal{N}(0, \sigma^2 I_n)
\]
\(y\): \(n \times 1\) vector of observed outcomes
\(X\): \(n \times p\) design matrix of predictors
\(\beta\): \(p \times 1\) vector of unknown coefficients
\(e\): Gaussian noise with mean \(0\) and variance \(\sigma^2\)
Likelihood in Bayesian Linear Regression
Because the residuals are Gaussian, it follows that the marginal distribution of \(y\) is:
\[
e \sim \mathcal{N}(0, \sigma^2 I_n)
\] The marginal distribution of \(y\) is:
\[
y \sim \mathcal{N}(X\beta, \sigma^2 I_n)
\] This defines the likelihood the probability of the observed data given parameters \(\beta\) and \(\sigma^2\):
This posterior captures all updated knowledge about the unknown parameters after observing the data.
It forms the basis for computing posterior means, credible intervals, and predictions.
Conjugacy and Gibbs Sampling
With conjugate priors, each parameter’s full conditional distribution has a closed-form solution.
This makes Gibbs sampling a natural and efficient inference method.
Parameters are updated one at a time, each from its conditional posterior.
The resulting Markov chain explores the joint posterior of \((\beta, \sigma_b^2, \sigma^2)\).
Gibbs sampling thus provides an easy way to approximate the full posterior in Bayesian linear regression.
Full Conditional for \(\beta\)
Given \(\sigma^2\), \(\sigma_b^2\), and the data \(y\), the regression coefficients have a multivariate normal conditional posterior:
\[
\beta \mid \sigma^2, \sigma_b^2, y \sim
\mathcal{N}(\mu_\beta, \Sigma_\beta)
\]
The term \(\frac{I}{\sigma_b^2}\) introduces shrinkage, regularizing estimates and stabilizing inference especially when \(p > n\) or predictors are highly correlated.
Thus, the Bayesian posterior mean is a regularized, uncertainty-aware generalization of OLS.
Full Conditional for \(\beta_j\)
Instead of sampling \(\beta\) jointly, we can update each coefficient \(\beta_j\)one at a time, holding all others fixed efficient for large \(p\) or spike-and-slab models.
Let \(X_j\) be the \(j\)th column of \(X\) and define the partial residual:
\[
r_j = y - X_{-j} \beta_{-j}
\]
Then the conditional posterior for \(\beta_j\) is univariate normal:
This corresponds to a regularized least-squares update. Residual updates avoid matrix inversion, scale to high dimensions, and extend naturally to sparse (spike-and-slab) models.
Full Conditional for \(\sigma_b^2\)
The conditional distribution of the prior variance\(\sigma_b^2\), given \(\beta\) and the hyperparameters, is a scaled inverse-chi-squared:
At each Gibbs iteration, \(\sigma_b^2\) is sampled directly given \(\beta\). This update reflects our revised belief about the variability of effect sizes after observing the current posterior draw of \(\beta\).
Full Conditional for \(\sigma^2\)
The conditional distribution of the residual variance\(\sigma^2\),
given \(\beta\) and the data, is also scaled inverse-chi-squared:
\[
\sigma^2 \mid \beta, y \sim \tilde{S} \, \chi^{-2}(\tilde{v})
\]
where
\[
\tilde{v} = v + n, \quad
\tilde{S} = \frac{(y - X\beta)^\top (y - X\beta) + v S}{\tilde{v}}
\]
At each Gibbs iteration, \(\sigma^2\) is sampled directly given \(\beta\).
This captures our updated belief about the residual variability
after accounting for the current linear predictor \(X\beta\).
Gibbs Sampling: Motivation
Bayesian inference often involves complex posteriors that lack closed-form solutions. To approximate these, we use Markov Chain Monte Carlo (MCMC) methods.
MCMC builds a Markov chain whose stationary distribution is the target posterior. Once the chain has converged, its samples can be used to estimate:
Posterior means, variances, and credible intervals
Predictive distributions
Other functions of interest
Among MCMC algorithms, the Gibbs sampler is especially useful when all full conditional distributions are available in closed form.
Gibbs Sampling: The Algorithm
For Bayesian linear regression with conjugate priors, the joint posterior is:
We iteratively draw from the following full conditionals:
Sample \(\beta \mid \sigma_b^2, \sigma^2, y\)
Sample \(\sigma_b^2 \mid \beta\)
Sample \(\sigma^2 \mid \beta, y\)
Each step updates one parameter given the latest values of the others. Repeating this sequence yields samples from the joint posterior\(p(\beta, \sigma_b^2, \sigma^2 \mid y)\).
Because each conditional is standard (Normal or scaled inverse-\(\chi^2\)), Gibbs sampling is both efficient and easy to implement.
Posterior Summaries
After running the Gibbs sampler, we obtain posterior draws \(\{\theta^{(t)}\}_{t=1}^T\) for parameters such as \(\beta_j\), \(\sigma^2\), or \(\sigma_b^2\).
The width of the credible interval reflects this uncertainty. Parameters with broader posteriors are estimated with less precision, and the degree of uncertainty depends on both the data and the prior.
Posterior Prediction
Given a new observation \(x_{\text{new}}\), we can predict using posterior draws:
Compute predicted means for each sample: \[
\hat{y}_{\text{new}}^{(t)} = x_{\text{new}}^\top \beta^{(t)}
\]
The resulting samples \(\{y_{\text{new}}^{(t)}\}\) form a posterior predictive distribution, from which we can derive predictive intervals and evaluate predictive accuracy.
Model Checking and Hypothesis Testing
Posterior samples enable rich model diagnostics and hypothesis testing:
Posterior probability of an event \[
\Pr(\beta_j \ne 0 \mid y)
\approx \frac{1}{T} \sum_{t=1}^{T} \mathbf{1}\!\left(\beta_j^{(t)} \ne 0\right)
\]
Posterior predictive checks
Simulate new datasets using posterior draws and compare them to the observed data to assess model fit.
Model comparison
Bayes factors and marginal likelihoods can be approximated to formally test or compare competing models.
These tools extend Bayesian inference beyond estimation to model validation, uncertainty quantification, and decision-making.
Convergence Diagnostics
Before interpreting MCMC results, we must check that the Gibbs sampler has converged to the target posterior distribution.
Convergence diagnostics assess whether the Markov chain has reached its stationary distribution and is producing valid samples.
Two basic strategies are:
Burn-in – Discard early iterations (e.g., first 1000) to remove dependence on starting values.
Thinning – Keep every \(k\)-th sample to reduce autocorrelation.
These steps improve sample quality and ensure reliable posterior summaries.
Trace Plots
A simple yet powerful diagnostic is the trace plot,
showing sampled parameter values \(\theta^{(t)}\) over iterations \(t\).
A converged chain fluctuates around a stable mean — no trend or drift.
Multiple chains from different starting points should overlap and mix well.
Trace plots help detect: - Lack of stationarity (upward/downward trends) - Poor mixing or multimodality - Burn-in issues
Visual inspection is often the first step in assessing convergence.
Autocorrelation
Samples from a Gibbs sampler are correlated, especially for tightly coupled parameters.
The autocorrelation function (ACF) quantifies dependence across lags \(k\):
These are conjugate, providing closed-form conditional updates. Hyperparameters \((S_b, v_b)\) and \((S, v)\) encode prior beliefs about effect size variability and residual noise.
Joint Posterior Structure
Combining the likelihood and priors, the joint posterior is:
This captures our updated beliefs about effects, inclusion indicators, and variance components.
Gibbs Sampling for Spike-and-Slab BLR
Inference proceeds via Gibbs sampling, cycling through these conditional updates:
\(\alpha \mid D\)
\(\delta \mid D\)
\(\pi \mid D\)
\(\sigma_b^2 \mid D\)
\(\sigma^2 \mid D\)
Here, \(D\) denotes the data and all other current parameter values.
Each conditional follows a standard distribution (Normal, Bernoulli, Beta, scaled-\(\chi^{-2}\)).
Iterating these updates generates samples from the joint posterior.
Posterior Inclusion Probabilities
The posterior inclusion probability (PIP) measures how likely each predictor is truly associated with \(y\):
PIPs summarize variable relevance and drive Bayesian feature selection.
Summary of Bayesian Linear Regression
Bayesian Linear Regression combines likelihood and prior to form the posterior, enabling principled modeling, regularization, and uncertainty quantification.
Inference via MCMC (often Gibbs sampling) with posterior draws for means, credible intervals, and predictions.
Spike-and-slab priors enable sparsity and variable selection, assigning exact zeros to irrelevant predictors and identifying key variables via posterior inclusion probabilities (PIPs).
Conjugate and mixture priors yield efficient and robust inference, even when \(p > n\).
With proper convergence checks, Bayesian models provide stable and reliable inference across a wide range of data settings.
Applications in Genomics
We have now seen the basic framework of Bayesian Linear Regression (BLR)
and will illustrate how it provides a unified approach for analyzing genetic and genomic data.
Genome-Wide Association Studies (GWAS) and fine-mapping of causal variants.
Genetic prediction and heritability estimation.
Pathway and gene-set enrichment analyses.
These examples show how BLR connects statistical modeling with biological interpretation in quantitative genetics.