Bayesian linear regression models have been proposed as a unified framework for gene mapping, genomic risk scoring, estimation of genetic parameters and effect size distribution. BLR methods use an iterative algorithm for estimating joint marker effects that account for LD and allow for differential shrinkage of marker effects. Estimation of the joint marker effects depend on additional model parameters such as a probability of being causal (\(\pi\)), an overall marker variance (\(\sigma_{b}^2\)), and residual variance (\(\sigma_e^2\)). Estimation of model parameters can be done using MCMC techniques by sampling from fully conditional posterior distributions.
Genomic risk scoring using Bayesian Linear Regression (BLR) models is a very flexible approach for accounting for the underlying genetic architecture of the traits. It can be implemented using GWAS summary statistics and a reference linkage disequilibrium (LD) correlation matrix. Ideally the summary statistics will correspond to the most powerful GWAS results available on the phenotype under study.
Different BLR models can be fitted in the qgg
-package in
the function gbayes()
, where the argument
method=
specifies which prior marker variance should be
used. BLR models can be fitted using individual level genotype and
phenotype data or based on GWAS summary statistics and a reference
linkage disequilibrium (LD) correlation matrix.
# Load libraries
library(qgg)
library(gact)
# Load GAlist
GAlist <- readRDS(file="C:/Users/gact/hsa.0.0.1/GAlist_hsa.0.0.1.rds")
# Check studies in gact database
GAlist$studies
# List BLR results files allready in gact database
list.files(GAlist$dirs["gbayes"])
# Load Glist with information on 1000G
Glist <- readRDS(file=file.path(GAlist$dirs["marker"],"Glist_1000G.rds"))
# Select study
studyID <- "GWAS1"
# Extract GWAS summary statistics for GWAS1
stat <- getMarkerStatDB(GAlist=GAlist, studyID=studyID)
# Check and align summary statistics based on marker information in Glist
stat <- checkStat(Glist=Glist, stat=stat,
excludeMAF=0.05,
excludeMAFDIFF=0.05,
excludeINFO=0.8,
excludeCGAT=TRUE,
excludeINDEL=TRUE,
excludeDUPS=TRUE,
excludeMHC=FALSE,
excludeMISS=0.05,
excludeHWE=1e-12)
# Extract marker sets derived from KEGG pathways
sets <- getMarkerSets(GAlist = GAlist, feature = "KEGG")
In the Bayes C approach the marker effects, b, are a priori assumed to be sampled from a mixture with a point mass at zero and univariate normal distribution conditional on a common marker effect variance:
fit <- gbayes(stat=stat, Glist=Glist, method="bayesC", nit=1000)
pgs <- gscore(Glist=Glist, stat=fit$stat)
head(pgs)
In the Bayes R approach the marker effects, b, are a priori assumed to be sampled from a mixture with a point mass at zero and univariate normal distributions conditional on a common marker effect variance:
fit <- gbayes(Glist=Glist, stat=stat, method="bayesR", nit=5000, nburn=1000)
pgs <- gscore(Glist=Glist, stat=fit$stat)
head(pgs)
In the Bayes R approach the marker effects, b, are a priori assumed to be sampled from a mixture with a point mass at zero and univariate normal distributions conditional on a common marker effect variance:
fit <- gbayes(Glist=Glist, stat=stat, method="bayesR", nit=5000, nburn=1000)
pgs <- gscore(Glist=Glist, stat=fit$stat, sets=sets)
head(pgs)