gact

An R Package for Creating a Database of Genomic Association of Complex Traits

The R package gact is designed for establishing and populating a comprehensive database focused on genomic associations with complex traits. The package serves two primary functions: infrastructure creation and data acquisition. It facilitates the assembly of a structured repository that includes single marker associations, all rigorously curated to ensure the high quality of data. Beyond individual genetic markers, the package integrates a broad spectrum of genomic entities, encompassing genes, proteins, and an array of biological complexes (chemical and protein), as well as various biological pathways. It is designed to aid in the biological interpretation of genomic associations, shedding light on their complex relationships in the context of genomic associations of complex traits.



gact provides an infrastructure for efficient processing of large-scale genomic association data, including core functions for:


gact constructs gene and genetic marker sets from a range of biological databases including:


Installation of the gact package

To install the most recent version of the gact package from GitHub, use the following commands in R:

library(devtools)
devtools::install_github("psoerensen/gact")

Tutorials for downloading and installing the gact database

Below is a set of tutorials used for the gact package:

Download and set up the gact database, which is focused on genomic associations for complex traits:
Download and install gact database

Downloading and processing genome-wide association summary statistic and ingest into database:
Download and process new gwas summary statistics

Download and process genotype data from the 1000 Genomes Project (1000G) for different ancestries (European, East Asian, South Asian) used in different genomic analysis:
Download and process of 1000G data

Computing sparse Linkage Disequilibrium (LD) matrices for 1000 Genomes Project (1000G) data across different ancestries and exploring the LD data which is used in a number of genomic analysis (LD score regression, Vegas gene analysis, Bayesian Linear Regression models):
Compute sparse LD matrices for 1000G data

Tutorials for various types of genomic analysis using the gact database

Gene analysis using the VEGAS (Versatile Gene-based Association Study) approach using the 1000G LD reference data processed above:
Gene analysis using VEGAS

Gene set enrichment analysis (GSEA) based on BLR (Bayesian Linear Regression) model derived gene-level statistics and MAGMA (Multi-marker Analysis of GenoMic Annotation) (Bai et al. 2024).
Gene set analysis using BLR-MAGMA

Pathway prioritization using a BLR-MAGMA model and gene-level statistics derived from VEGAS (Gholipourshahraki et al. 2024).
Pathway prioritization using BLR-MAGMA

Finemapping of gene regions using single trait Bayesian Linear Regression models (Shrestha et al. 2023).
Finemapping of gene regions using BLR models

Finemapping of LD regions using single trait Bayesian Linear Regression models (Shrestha et al. 2023).
Finemapping of LD regions using BLR models

LD score regression for estimating genomic heritability and correlations.
LD score regression

Funding

These notes and scripts are prepared in the BALDER project funded by the ODIN platform. ODIN is sponsored by the Novo Nordisk Foundation (grant number NNF20SA0061466)

References

  1. Rohde PD, Sørensen IF, Sørensen P. 2020. qgg: an R package for large-scale quantitative genetic analyses. Bioinformatics 36:8. doi.org/10.1093/bioinformatics/btz955

  2. Rohde PD, Sørensen IF, Sørensen P. 2023. Expanded utility of the R package, qgg, with applications within genomic medicine. Bioinformatics 39:11. doi.org/10.1093/bioinformatics/btad656

  3. Shrestha et al. 2023. Evaluation of Bayesian Linear Regression Models as a Fine Mapping Tool. Submitted doi.org/10.1101/2023.09.01.555889

  4. Bai et al. 2024. Evaluation of multiple marker mapping methods using single trait Bayesian Linear Regression models. In preparation

  5. Gholipourshahraki et al. 2024. Evaluation of Bayesian Linear Regression Models for Pathway Prioritization. In preparation