Center for Quantitative Genetics and Genomics, Aarhus University, Denmark
Bayesian Linear Regression Models provide a flexible statistical framework for modeling complex biological and healthcare data. They support key applications such as:
The Bayesian Linear Regression (BLR) model builds:
\[ \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \qquad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}) \]
In the Bayesian formulation, each \(\beta_j\) is assigned a prior distribution reflecting beliefs about effect size magnitude or sparsity and determine how information is shared across features or biological layers.
Regression effects can be estimated in many ways, but we focus on a Bayesian hierarchical framework because it:
Through their hierarchical structure, BLR models naturally integrate multiple biological layers — linking genomic, transcriptomic, and other molecular data
The BLR model extends the linear framework by introducing hierarchical priors on parameters. Suppose we model regression coefficients \(b_j\) with a normal prior:
\[ b_j \sim \mathcal{N}(0, \sigma_b^2) \]
Here, \(\sigma_b^2\) (the variance of the prior) controls how large the effects \(b_j\) are expected to be.
Instead of fixing \(\sigma_b^2\) we treat it as unknown and assign it its own prior — the hyperprior:
\[ \sigma_b^2 \sim \text{Inv-}\chi^2(\nu, S^2) \]
In practice, this parameter is learned from the data during estimation rather than fixed in advance.
The three levels in the model:
| Level | Description | Example |
|---|---|---|
| 1 | Describes how data are generated given parameters | \(y \sim \mathcal{N}(Xb, \sigma^2 I)\) |
| 2 | Describes our beliefs about the parameters before seeing data | \(b_i \sim \mathcal{N}(0, \sigma_b^2)\) |
| 3 | Describes uncertainty about the prior’s parameters | \(\sigma_b^2 \sim \text{Inv-}\chi^2(\nu, S^2)\) |
This hierarchical structure allows the model to learn how strongly to shrink effect estimates from the data, while accounting for uncertainty in prior parameters and automatically regularizing effect sizes.
Simple and robust, but may not capture diverse effect-size distributions.
Complex traits arise from heterogeneous effect-size distributions — some features have large effects, many have small, and others are likely null. To capture this diversity, the BLR framework can be extended in two ways:
Data-driven grouping of molecular features:
The model learns effect-size classes from the data using a mixture of variances \(\{\tau_k^2\}\) with probabilities \(\{\pi_k\}\).
Biologically informed grouping of molecular features:
The model uses prior biological knowledge to assign features to groups a priori, each with its own variance \(\tau_g^2\) capturing within-group variability.
Both approaches enable the model to adapt to complex genetic and molecular architectures and share information across related features or omic layers
A more flexible formulation assumes that effects come from a mixture of normal distributions:
\[ \beta_j \mid d_j = k \sim \mathcal{N}(0, \tau_k^2), \qquad P(d_j = k) = \pi_k \]
This structure allows the model to capture both large and small effects, including potential nulls.
In practice, both \(\pi_k\) and \(\tau_k^2\) are learned from the data, making the model data-driven and adaptive, though not necessarily biologically informed.
In this biologically informed extension, we assign features to predefined groups (e.g., genes, pathways, or protein complexes) a priori and model group-specific variances:
\[ \beta_j \sim \mathcal{N}\!\big(0, \tau_{g(j)}^2\big), \qquad \tau_g^2 \sim p(\tau_g^2) \]
This approach allows the model to share information among related features and test enrichment across biological groups.
In practice, the group-level variances \(\tau_g^2\) are learned from the data, enabling the model to adapt shrinkage across biological structures.
In Bayesian variable selection, each feature \(j\) is assigned an indicator variable:
\[ \delta_j = \begin{cases} 1, & \text{if feature $j$ has a non-zero effect} \\ 0, & \text{if feature $j$ has no effect.} \end{cases} \]
The model can be written as: \[ \beta_j = \alpha_j \, \delta_j, \qquad \alpha_j \sim \mathcal{N}(0, \tau^2), \quad \delta_j \sim \text{Bernoulli}(\pi) \]
where \(\pi\) is the prior inclusion probability.
After inference, we estimate \(\text{PIP}_j = P(\delta_j = 1 \mid \text{data})\) — the posterior inclusion probability for feature j.
The BLR model produces posterior summaries that describe the evidence, magnitude, and uncertainty of feature effects.
Key outputs:
Each plays a distinct and complementary role in interpretation.
Represent the probability that feature \(j\) has a nonzero effect: \[ \text{PIP}_j = \Pr(\beta_j \neq 0 \mid \mathbf{Y}, \mathbf{X}) \]
Quantifies evidence of inclusion in the model:
Interpretation:
In pathway analyses (e.g., Bayesian MAGMA), PIPs correspond to gene- or set-level importance scores.
Variance components describe uncertainty and heterogeneity in effect sizes:
\[ \beta_j \sim \mathcal{N}(0, \tau^2) \qquad \text{and} \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \]
\(\tau^2\) is the variance component for the effect sizes
- Small \(\tau^2\): most effects are close to zero
- Large \(\tau^2\): more large-effect features expected
\(\sigma^2\) is the variance component for the residuals
- Captures unexplained variation after accounting for \(\mathbf{X}\)
Together, these components describe the genetic architecture, enabling heritability estimation, uncertainty quantification, and enrichment analysis across sets or traits.
| Quantity | Interpretation | Typical use |
|---|---|---|
| \(\hat{\beta}_j\) | Direction and magnitude of effect | Effect estimation, prediction |
| \(\text{PIP}_j\) | Probability feature is truly associated | Fine-mapping, feature ranking |
| \(\tau^2\), \(\sigma^2\) | Variance in effect sizes and residuals | Genetic architecture, model fit |
These summaries are synergistic:
| Model Type | Prior Structure | Biological Interpretation |
|---|---|---|
| Single-component BLR | One global variance \(\tau^2\) | All features (across layers) share the same level of shrinkage — equal contribution assumption |
| Multiple-component BLR | Mixture of variances \(\{\tau_k^2\}\) | Features belong to different effect-size classes (e.g., large, small, null); grouping learned from data |
| Hierarchical (Biologically informed) BLR | Group-specific mixtures of variances \(\{\tau_{gk}^2\}\) | Features grouped a priori (e.g., by genes, pathways, or omic layers); within each group, effects can vary in size and sparsity |
These models form a hierarchy of increasing flexibility and biological realism —
from global shrinkage → to data-driven heterogeneity → to biologically structured mixtures that model variation within and between feature sets.
Many traits and molecular layers are correlated — they share genetic architecture and biological pathways.
To model these dependencies, we extend BLR to the multivariate setting:
In the multivariate BLR model, we model multiple correlated outcomes jointly:
\[ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E} \]
Each row of \(\mathbf{Y}\) corresponds to an observation or gene, and each column to a trait, phenotype, or molecular layer.
We extend the univariate priors to the multivariate setting:
\[ \mathbf{e}_{i\cdot} \sim \mathcal{N}_T(\mathbf{0}, \boldsymbol{\Sigma}_e) \] \[ \mathbf{b}_j \sim \mathcal{N}_T(\mathbf{0}, \boldsymbol{\Sigma}_b) \]
Allows information sharing across correlated traits or omic layers and can be used to identify pleiotropic effects and cross-trait genetic architectures.
The hierarchical structure can be extended to model multiple traits
while preserving biological grouping of features:
\[ \mathbf{b}_j \sim \mathcal{N}_T\!\big(\mathbf{0}, \boldsymbol{\Sigma}_{b, g(j)}\big), \qquad \boldsymbol{\Sigma}_{b, g} \sim p(\boldsymbol{\Sigma}_{b, g}) \]
Enables information sharing both within biological sets and across correlated traits.
In the multivariate BLR, each feature \(j\) may affect multiple outcomes (traits).
We extend the indicator variable to capture cross-trait activity patterns:
\[ \boldsymbol{\delta}_j = \begin{bmatrix} \delta_{j1} \\ \delta_{j2} \\ \vdots \\ \delta_{jT} \end{bmatrix}, \qquad \delta_{jt} = \begin{cases} 1, & \text{if feature $j$ affects trait $t$} \\ 0, & \text{otherwise.} \end{cases} \]
After inference, we estimate \(\text{PIP}_{jt} = P(\delta_{jt} = 1 \mid \text{data})\) — the posterior inclusion probability that feature j affects trait t.
In the multivariate setting, we generalize each posterior quantity:
| Parameter | Interpretation |
|---|---|
| \(\mathbf{B} = [\beta_{jt}]\) | Effect matrix across traits (\(j\): feature, \(t\): trait) |
| \(\text{PIP}_j\) | Probability that feature \(j\) affects ≥1 trait |
| \(\boldsymbol{\Sigma}_b\) | Covariance of effects across traits |
| \(\boldsymbol{\Sigma}_e\) | Residual covariance among traits |
These allow us to identify:
| Model Type | Feature Integration | Grouping Basis | Prior Structure | What It Captures |
|---|---|---|---|---|
| Single-component BLR | Combines all biological features in one model | None | One global variance (\(\tau^2\)) | All features contribute equally; uniform shrinkage |
| Multiple-component BLR | Integrates all layers but allows heterogeneous contributions | Learned from data | Mixture of variances (\(\{\tau_k^2\}\)) | Large, small, and null effect classes |
| Hierarchical BLR | Groups features by biological structure (e.g., genes, pathways) | Defined a priori | Group-specific mixture of variances (\(\{\tau_{gk}^2\}\)) | Within-group heterogeneity; enrichment and structured shrinkage |
| Multivariate BLR | Jointly models multiple correlated traits or outcomes | None or by trait | Shared covariance (\(\boldsymbol{\Sigma}_b\)) across traits | Genetic/molecular correlations; pleiotropy |
| Hierarchical MV-BLR | Combines biological grouping and multiple outcomes | Defined a priori | Group- and trait-specific covariance mixtures (\(\{\boldsymbol{\Sigma}_{b,gk}\}\)) | Shared biological mechanisms across traits and layers |
| Model Level | Key Parameters Learned | What They Represent | How They Are Learned | What We Learn Biologically |
|---|---|---|---|---|
| Effect sizes | \(\boldsymbol{\beta}\) | Strength and direction of association for each feature | Posterior mean/median given priors and data | Which features drive the outcome |
| Indicator variables | \(\delta_j\) (single trait), \(\boldsymbol{\delta}_j\) (multi-trait) | Whether feature \(j\) is active (and for which traits) | Estimated as posterior inclusion probabilities (PIPs) | Which features are relevant, and whether effects are shared or trait-specific |
| Variance components | \(\tau^2\), \(\{\tau_k^2\}\), \(\{\tau_{gk}^2\}\) | Magnitude of expected effect sizes; heterogeneity across layers or groups | Inferred hierarchically from the data (via MCMC or EM) | How strongly different groups or omic layers contribute |
| Covariance components | \(\boldsymbol{\Sigma}_b\), \(\{\boldsymbol{\Sigma}_{b,g}\}\) | Correlation of effects across traits or molecular layers | Estimated from joint posterior | Shared pathways, pleiotropy, and cross-layer architecture |
In Bayesian Linear Regression, the variance components are not fixed — they are estimated jointly with the effect sizes \(\boldsymbol{\beta}\).
The model defines hierarchical priors: \[ \beta_j \sim \mathcal{N}(0, \tau^2) \qquad \text{and} \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \] where \(\tau^2\) and \(\sigma^2\) are unknown parameters.
These variances are updated iteratively during MCMC or EM optimization:
Inference alternates between sampling (or updating) \(\boldsymbol{\beta}\) and re-estimating the variance components given the data and current effects.
Each variance component has its own prior, enabling uncertainty propagation:
\[ \begin{aligned} \mathbf{Y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, & \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma^2\mathbf{I}) \\ \beta_j &\sim \mathcal{N}(0, \tau^2) & \tau^2 \sim \text{Inv-}\chi^2(\nu_\tau, S_\tau^2) \\ \sigma^2 &\sim \text{Inv-}\chi^2(\nu_\sigma, S_\sigma^2) \end{aligned} \]
This full hierarchical treatment improves stability and interpretability compared to fixing variance components a priori.
Variance components can be set-specific or mixture-specific,
and are all estimated from the data:
\[ \beta_j \sim \mathcal{N}\!\big(0, \tau_{g(j)}^2\big), \qquad \tau_g^2 \sim p(\tau_g^2) \]
or
\[ \beta_j \sim \sum_{k=1}^{K} \pi_k\,\mathcal{N}(0, \tau_k^2), \qquad \tau_k^2 \sim p(\tau_k^2) \]
Variance components act as adaptive shrinkage parameters, controlling model complexity based on the data.
Variance components generalize to covariance matrices:
\[ \mathbf{b}_j \sim \mathcal{N}_T(\mathbf{0}, \boldsymbol{\Sigma}_b), \qquad \mathbf{E}_{i\cdot} \sim \mathcal{N}_T(\mathbf{0}, \boldsymbol{\Sigma}_e) \]
Estimating \(\boldsymbol{\Sigma}_b\) and \(\boldsymbol{\Sigma}_e\) enables discovery of pleiotropy and cross-trait genetic structure.
| Parameter | Role | Interpretation |
|---|---|---|
| \(\tau^2\) | Effect-size variance | How much true genetic signal exists |
| \(\sigma^2\) | Residual variance | How much variation remains unexplained |
| \(\tau_g^2\) / \(\tau_k^2\) | Group or component variance | Which sets/components are enriched |
| \(\boldsymbol{\Sigma}_b\) | Cross-trait covariance | Pleiotropy or shared mechanisms |
These variance components are not tuning parameters — they are learned quantities that describe the underlying biology. Their estimation is central to the interpretability of BLR.
In short, variance components are learned descriptors of architecture, not just technical parameters and are a cornerstone of the BLR framework.
Sørensen P, Rohde PD. A Versatile Data Repository for GWAS Summary Statistics-Based Downstream Genomic Analysis of Human Complex Traits.
medRxiv (2025). https://doi.org/10.1101/2025.10.01.25337099
Sørensen IF, Sørensen P. Privacy-Preserving Multivariate Bayesian Regression Models for Overcoming Data Sharing Barriers in Health and Genomics.
medRxiv (2025). https://doi.org/10.1101/2025.07.30.25332448
Hjelholt AJ, Gholipourshahraki T, Bai Z, Shrestha M, Kjølby M, Sørensen P, Rohde P. Leveraging Genetic Correlations to Prioritize Drug Groups for Repurposing in Type 2 Diabetes. medRxiv (2025). https://doi.org/10.1101/2025.06.13.25329590
Gholipourshahraki T, Bai Z, Shrestha M, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Evaluation of Bayesian Linear Regression Models for Gene Set Prioritization in Complex Diseases. PLOS Genetics 20(11): e1011463 (2025). https://doi.org/10.1371/journal.pgen.1011463
Bai Z, Gholipourshahraki T, Shrestha M, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Evaluation of Bayesian Linear Regression Derived Gene Set Test Methods. BMC Genomics 25(1): 1236 (2024). https://doi.org/10.1186/s12864-024-11026-2
Shrestha M, Bai Z, Gholipourshahraki T, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Enhanced Genetic Fine Mapping Accuracy with Bayesian Linear Regression Models in Diverse Genetic Architectures. PLOS Genetics 21(7): e1011783 (2025). https://doi.org/10.1371/journal.pgen.1011783
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving Polygenic Prediction from Summary Data by Learning Patterns of Effect Sharing Across Multiple Phenotypes. PLOS Genetics 21(1): e1011519 (2025). https://doi.org/10.1371/journal.pgen.1011519
Rohde P, Sørensen IF, Sørensen P. Expanded Utility of the R Package qgg with Applications within Genomic Medicine. Bioinformatics 39:11 (2023). https://doi.org/10.1093/bioinformatics/btad656
Rohde P, Sørensen IF, Sørensen P. qgg: An R Package for Large-Scale Quantitative Genetic Analyses. Bioinformatics 36(8): 2614–2615 (2020). https://doi.org/10.1093/bioinformatics/btz955