Center for Quantitative Genetics and Genomics, Aarhus University, Denmark

gact provides an infrastructure for efficient processing of large-scale genomic association data, with core functions for:
gact is intended to serve as a practical implementation of integrative genomics, bridging statistical modeling and biological interpretation, and supporting reproducible and extensible workflows.
The gact() function is a single R command that creates and populates the Genomic Association of Complex Traits (GACT) database.
It automates three main tasks:
glist, gstat, gsets, marker, gtex, download, etc.)gact constructs gene and marker sets from a wide range of curated biological databases:
We plan to add additional biological resources in gact.
The gact R package includes utility functions to extract and structure data from the GACT database into analysis-ready inputs — \(\mathbf{Y}\) (e.g., summary statistic outcomes) and \(\mathbf{X}\) (genomic or biological features).
getMarkerStat() — retrieve GWAS summary statistics (Y’s)getFeatureStat() — extract gene-, protein-, or pathway-level results (Y’s)getMarkerSets() — define biological groupings (basis for X’s)designMatrix() — build feature matrices (X) linking variants or genes to biological feature setsTogether, these functions form a reproducible workflow for generating standardized input data for Bayesian Hierarchical Models and other machine learning approaches.
qgg provides tools for statistical modeling and analysis of large-scale genomic data, including:
qgg handles large-scale genomic data through efficient algorithms and sparse matrix techniques, combined with multi-core processing using OpenMP, multithreaded matrix operations via BLAS libraries (e.g., OpenBLAS, ATLAS, or MKL), and fast, memory-efficient batch processing of genotype data stored in
binary formats such as PLINK .bed files.
The tutorials listed below are external resources and are not included in this repository.
Gene analysis using VEGAS: Gene analysis using the VEGAS (Versatile Gene-based Association Study) approach using the 1000G LD reference data processed above,
Gene set analysis using Bayesian MAGMA: Pathway prioritization using a single and multiple trait Bayesian MAGMA models and gene-level statistics derived from VEGAS (Gholipourshahraki et al.2024).
Gene ranking using PoPS: Polygenic Prioritization Scoring (PoPS) using BLR models and gene-level statistics derived from VEGAS (work in progress).
Finemapping using BLR models: Finemapping of gene and LD regions using single trait Bayesian Linear Regression models (Shrestha et al.2025).
Polygenic scoring using BLR models: Polygenic scoring (PGS) using Bayesian Linear Regression models and biological pathway information (work in progress).
Polygenic scoring using PGS Catalog: Polygenic scoring (PGS) using summary statistics from PGS catalog and biological pathway information.
LD score regression: LD score regression for estimating genomic heritability and correlations.
The gact and qgg R packages bridges data integration, statistical modeling and biological interpretation, enabling reproducible and extensible workflows.
Next Steps
💡 We are open to collaboration!
If you’re interested in applying BLR methods or contributing to the gact framework, please reach out.
Further Reading