Bayesian Linear Regression

Authors
Affiliations

Palle Duun Rohde

Genomic Medicine, Department of Health Science and Technology, Aalborg University, Denmark

Peter Sørensen

Center for Quantitative Genetics and Genomics, Aarhus University, Denmark

The following materials include theoretical notes, slides, and practical R examples for exploring Bayesian Linear Regression. It introduces both classical and Bayesian regression methods, showing how to estimate parameters, define priors, perform posterior inference via Gibbs sampling, and assess convergence - all through practical R code.

Explore the sections below to find the corresponding materials.


Overview of Materials

Section Description
BLR notes Theoretical notes on Bayesian linear regression, Gibbs sampling, and convergence diagnostics.
BLR slides Lecture slides summarizing key theoretical concepts and derivations in Bayesian Linear Regression Analyses.
BLR-GSEA slides Lecture slides introducing Bayesian Linear Regression Models used in Gene Set Analyses.
Bayesian MAGMA slides Lecture slides introducing Gene Set Analyses using Bayesian MAGMA Models.
Classical Regression tutorial Simulation and estimation using ordinary least squares (OLS) in R.
Bayesian (Gaussian Prior) tutorial Bayesian regression with conjugate Gaussian priors and closed-form Gibbs sampling in R.
Bayesian (Spike & Slab) tutorial Bayesian regression with spike-and-slab priors for variable selection and sparsity in R.
Bayesian MAGMA tutorial Bayesian gene set analysis in R.

Download Notes (PDF)
Download Slides (PDF)



Further Reading

Further details on the theory and computation behind Bayesian linear regression, Gibbs sampling, and hierarchical modeling can be found in
Sorensen, D. (2025). Statistical Learning in Genetics: An Introduction Using R. Springer.

This book provides a rigorous and accessible introduction to Bayesian modeling, hierarchical inference, and statistical learning methods in quantitative genetics and genomics.

The qgg R Package

qgg provides tools for statistical modeling and analysis of large-scale genomic data, including:

  • Fine-mapping of genomic regions using Bayesian Linear Regression (BLR) models
  • Polygenic scoring using Bayesian Linear Regression (BLR) models
  • Gene set enrichment analysis using Bayesian Linear Regression (BLR) models

qgg handles large-scale genomic data through efficient algorithms and sparse matrix techniques, combined with multi-core processing using OpenMP, multithreaded matrix operations via BLAS libraries (e.g., OpenBLAS, ATLAS, or MKL), and fast, memory-efficient batch processing of genotype data stored in
binary formats such as PLINK .bed files.

The gact R Package

gact provides an infrastructure for efficient processing of large-scale genomic association data, with core functions for:

  • Establishing and populating a database of genomic associations
  • Downloading and processing biological databases
  • Handling and processing GWAS summary statistics
  • Linking genetic markers to genes, proteins, metabolites, and biological pathways
  • Integrates with statistical machine learning tools in the qgg R package

gact is intended to serve as a practical implementation of integrative genomics, bridging statistical modeling and biological interpretation, and supporting reproducible and extensible workflows.

Slides introducing the qgg and gact R packages

Tutorials using the qgg and gact R packages

References

Sørensen P, Rohde PD. A Versatile Data Repository for GWAS Summary Statistics-Based Downstream Genomic Analysis of Human Complex Traits. medRxiv (2025). https://doi.org/10.1101/2025.10.01.25337099

Sørensen IF, Sørensen P. Privacy-Preserving Multivariate Bayesian Regression Models for Overcoming Data Sharing Barriers in Health and Genomics. medRxiv (2025). https://doi.org/10.1101/2025.07.30.25332448

Hjelholt AJ, Gholipourshahraki T, Bai Z, Shrestha M, Kjølby M, Sørensen P, Rohde P. Leveraging Genetic Correlations to Prioritize Drug Groups for Repurposing in Type 2 Diabetes. medRxiv (2025). https://doi.org/10.1101/2025.06.13.25329590

Gholipourshahraki T, Bai Z, Shrestha M, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Evaluation of Bayesian Linear Regression Models for Gene Set Prioritization in Complex Diseases. PLOS Genetics 20(11): e1011463 (2025). https://doi.org/10.1371/journal.pgen.1011463

Bai Z, Gholipourshahraki T, Shrestha M, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Evaluation of Bayesian Linear Regression Derived Gene Set Test Methods. BMC Genomics 25(1): 1236 (2024). https://doi.org/10.1186/s12864-024-11026-2

Shrestha M, Bai Z, Gholipourshahraki T, Hjelholt A, Rohde P, Fuglsang MK, Sørensen P. Enhanced Genetic Fine Mapping Accuracy with Bayesian Linear Regression Models in Diverse Genetic Architectures. PLOS Genetics 21(7): e1011783 (2025). https://doi.org/10.1371/journal.pgen.1011783

Kunkel D, Sørensen P, Shankar V, Morgante F. Improving Polygenic Prediction from Summary Data by Learning Patterns of Effect Sharing Across Multiple Phenotypes. PLOS Genetics 21(1): e1011519 (2025). https://doi.org/10.1371/journal.pgen.1011519

Rohde P, Sørensen IF, Sørensen P. Expanded Utility of the R Package qgg with Applications within Genomic Medicine. Bioinformatics 39:11 (2023). https://doi.org/10.1093/bioinformatics/btad656

Rohde P, Sørensen IF, Sørensen P. qgg: An R Package for Large-Scale Quantitative Genetic Analyses. Bioinformatics 36(8): 2614–2615 (2020). https://doi.org/10.1093/bioinformatics/btz955