Finemapping using Bayesian Linear Regression Models

In the Bayesian multiple regression model, the posterior density of the model parameters depends on the likelihood of the data given the parameters and a prior probability for the model parameters. The choice of the prior for marker effects can influence the type and extent of shrinkage induced in the model.

Usage

gmap(
  y = NULL,
  X = NULL,
  W = NULL,
  stat = NULL,
  trait = NULL,
  sets = NULL,
  fit = NULL,
  Glist = NULL,
  chr = NULL,
  rsids = NULL,
  ids = NULL,
  b = NULL,
  bm = NULL,
  seb = NULL,
  mask = NULL,
  LD = NULL,
  n = NULL,
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  scaleY = TRUE,
  shrinkLD = FALSE,
  shrinkCor = FALSE,
  formatLD = "dense",
  pruneLD = TRUE,
  r2 = 0.05,
  checkLD = TRUE,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  models = NULL,
  checkConvergence = FALSE,
  critVe = 3,
  critVg = 5,
  critVb = 5,
  critPi = 3,
  ntrial = 1,
  nug = 4,
  nub = 4,
  nue = 4,
  verbose = FALSE,
  msize = 100,
  threshold = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  tol = 0.001,
  nit = 100,
  nburn = 50,
  nit_local = NULL,
  nit_global = NULL,
  method = "bayesC",
  algorithm = "mcmc"
)

Arguments

y: A vector or matrix of phenotypes.
X: A matrix of covariates.
W: A matrix of centered and scaled genotypes.
stat: Dataframe with marker summary statistics.
trait: Integer used for selection traits in covs object.
sets: A list of character vectors where each vector represents a set of items. If the names of the sets are not provided, they are named as "Set1", "Set2", etc.
fit: List of results from gbayes.
Glist: List of information about genotype matrix stored on disk.
chr: Chromosome for which to fit BLR models.
rsids: Character vector of rsids.
ids: vector of individuals used in the study
b: Vector or matrix of marginal marker effects.
bm: Vector or matrix of adjusted marker effects for the BLR model.
seb: Vector or matrix of standard error of marginal effects.
mask: Vector or matrix specifying if marker should be ignored.
LD: List with sparse LD matrices.
n: Scalar or vector of number of observations for each trait.
vg: Scalar or matrix of genetic (co)variances.
vb: Scalar or matrix of marker (co)variances.
ve: Scalar or matrix of residual (co)variances.
ssg_prior: Scalar or matrix of prior genetic (co)variances.
ssb_prior: Scalar or matrix of prior marker (co)variances.
sse_prior: Scalar or matrix of prior residual (co)variances.
lambda: Vector or matrix of lambda values
scaleY: Logical indicating if y should be scaled.
shrinkLD: Logical indicating if LD should be shrunk.
shrinkCor: Logical indicating if cor should be shrunk.
formatLD: Character specifying LD format (default is "dense").
pruneLD: Logical indicating if LD pruning should be applied.
r2: Scalar providing value for r2 threshold used in pruning
checkLD: Logical indicating if LD matches summary statistics.
h2: Trait heritability.
pi: Proportion of markers in each marker variance class.
updateB: Logical indicating if marker (co)variances should be updated.
updateG: Logical indicating if genetic (co)variances should be updated.
updateE: Logical indicating if residual (co)variances should be updated.
updatePi: Logical indicating if pi should be updated.
adjustE: Logical indicating if residual variance should be adjusted.
models: List structure with models evaluated in bayesC.
checkConvergence: Logical indicating if convergences should be checked.
critVe: Scalar providing value for z-score threshold used in checking convergence for Ve
critVg: Scalar providing value for z-score threshold used in checking convergence for Vg
critVb: Scalar providing value for z-score threshold used in checking convergence for Vg
critPi: Scalar providing value for z-score threshold used in checking convergence for Pi
ntrial: Integer providing number of trials used if convergence is not obtaines
nug: Scalar or vector of prior degrees of freedom for genetic (co)variances.
nub: Scalar or vector of prior degrees of freedom for marker (co)variances.
nue: Scalar or vector of prior degrees of freedom for residual (co)variances.
verbose: Logical; if TRUE, it prints more details during iteration.
msize: Integer providing number of markers used in computation of sparseld
threshold: Scalar providing value for threshold used in adjustment of B
ve_prior: Scalar or matrix of prior residual (co)variances.
vg_prior: Scalar or matrix of prior genetic (co)variances.
tol: Convergence criteria used in gbayes.
nit: Number of iterations.
nburn: Number of burnin iterations.
nit_local: Number of local iterations.
nit_global: Number of global iterations.
method: Method used (e.g. "bayesN","bayesA","bayesL","bayesC","bayesR").
algorithm: Specifies the algorithm.

Value

A list containing:

bmVector or matrix of posterior means for marker effects.
dmVector or matrix of posterior means for marker inclusion probabilities.
vbScalar or vector of posterior means for marker variances.
vgScalar or vector of posterior means for genomic variances.
veScalar or vector of posterior means for residual variances.
rbMatrix of posterior means for marker correlations.
rgMatrix of posterior means for genomic correlations.
reMatrix of posterior means for residual correlations.
piVector of posterior probabilities for models.
h2Vector of posterior means for model probability.
paramList of current parameters used for restarting the analysis.
statMatrix of marker information and effects used for genomic risk scoring.

Details

This function implements Bayesian linear regression models to provide unified mapping of genetic variants, estimate genetic parameters (e.g. heritability), and predict disease risk. It is designed to handle various genetic architectures and scale efficiently with large datasets.

Author

Peter Sørensen

Examples


# Plink bed/bim/fam files
bedfiles <- system.file("extdata", paste0("sample_chr",1:2,".bed"), package = "qgg")
bimfiles <- system.file("extdata", paste0("sample_chr",1:2,".bim"), package = "qgg")
famfiles <- system.file("extdata", paste0("sample_chr",1:2,".fam"), package = "qgg")

# Prepare Glist
Glist <- gprep(study="Example", bedfiles=bedfiles, bimfiles=bimfiles, famfiles=famfiles)

# Simulate phenotype
sim <- gsim(Glist=Glist, chr=1, nt=1)

# Compute single marker summary statistics
stat <- glma(y=sim$y, Glist=Glist, scale=FALSE)
str(stat)

# Define fine-mapping regions 
sets <- Glist$rsids
Glist$chr[[1]] <- gsub("21","1",Glist$chr[[1]]) 
Glist$chr[[2]] <- gsub("22","2",Glist$chr[[2]]) 

# Fine map
fit <- gmap(Glist=Glist, stat=stat, sets=sets, verbose=FALSE, 
            method="bayesC", nit=1500, nburn=500, pi=0.001)
            
fit$post  # Posterior inference for every fine-mapped region
fit$conv  # Convergence statistics for every fine-mapped region

# Posterior inference for marker effect
head(fit$stat)