Center for Quantitative Genetics and Genomics, Aarhus University, Denmark

gact provides an infrastructure for efficient processing of large-scale genomic association data, with core functions for:
gact is intended to serve as a practical implementation of integrative genomics, bridging statistical modeling and biological interpretation, and supporting reproducible and extensible workflows.
The gact() function is a single R command that creates and populates the Genomic Association of Complex Traits (GACT) database.
It automates three main tasks:
glist, gstat, gsets, marker, gtex, download, etc.)gact constructs gene and marker sets from a wide range of curated biological databases:
We plan to add additional biological resources in gact.
The gact R package includes utility functions to extract and structure data from the GACT database into analysis-ready inputs — \(\mathbf{Y}\) (e.g., summary statistic outcomes) and \(\mathbf{X}\) (genomic or biological features).
getMarkerStat() — retrieve GWAS summary statistics (Y’s)getFeatureStat() — extract gene-, protein-, or pathway-level results (Y’s)getMarkerSets() — define biological groupings (basis for X’s)designMatrix() — build feature matrices (X) linking variants or genes to biological feature setsTogether, these functions form a reproducible workflow for generating standardized input data for Bayesian Hierarchical Models and other machine learning approaches.
qgg provides tools for statistical modeling and analysis of large-scale genomic data, including:
qgg handles large-scale genomic data through efficient algorithms and sparse matrix techniques, combined with multi-core processing using OpenMP, multithreaded matrix operations via BLAS libraries (e.g., OpenBLAS, ATLAS, or MKL), and fast, memory-efficient batch processing of genotype data stored in
binary formats such as PLINK .bed files.
Gene analysis using VEGAS: Gene analysis using the VEGAS (Versatile Gene-based Association Study) approach using the 1000G LD reference data processed above,
Gene set analysis using Bayesian MAGMA: Pathway prioritization using a single and multiple trait Bayesian MAGMA models and gene-level statistics derived from VEGAS (Gholipourshahraki et al.2024).
Gene ranking using PoPS: Polygenic Prioritization Scoring (PoPS) using BLR models and gene-level statistics derived from VEGAS (work in progress).
Finemapping using BLR models: Finemapping of gene and LD regions using single trait Bayesian Linear Regression models (Shrestha et al.2025).
Polygenic scoring using BLR models: Polygenic scoring (PGS) using Bayesian Linear Regression models and biological pathway information (work in progress).
Polygenic scoring using PGS Catalog: Polygenic scoring (PGS) using summary statistics from PGS catalog and biological pathway information.
LD score regression: LD score regression for estimating genomic heritability and correlations.
The gact and qgg R packages bridges data integration, statistical modeling and biological interpretation, enabling reproducible and extensible workflows.
Next Steps
💡 We are open to collaboration!
If you’re interested in applying BLR methods or contributing to the gact framework, please reach out.
Further Reading