Bayesian linear regression models

Bayesian linear regression (BLR) models:

- unified mapping of genetic variants, estimation of genetic parameters (e.g. heritability) and prediction of disease risk)

- handles different genetic architectures (few large, many small effects)

- scale to large data (e.g. sparse LD)

In the Bayesian multiple regression model the posterior density of the model parameters depend on the likelihood of the data given the parameters and a prior probability for the model parameters

The prior density of marker effects defines whether the model will induce variable selection and shrinkage or shrinkage only. Also, the choice of prior will define the extent and type of shrinkage induced. Ideally the choice of prior for the marker effect should reflect the genetic architecture of the trait, and will vary (perhaps a lot) across traits.

The following prior distributions are provided:

Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).

Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO

Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.

Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).

Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .

Usage

gbayes(
  y = NULL,
  X = NULL,
  W = NULL,
  stat = NULL,
  covs = NULL,
  trait = NULL,
  fit = NULL,
  Glist = NULL,
  chr = NULL,
  rsids = NULL,
  b = NULL,
  bm = NULL,
  seb = NULL,
  LD = NULL,
  n = NULL,
  formatLD = "dense",
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  scaleY = TRUE,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  models = NULL,
  nug = 4,
  nub = 4,
  nue = 4,
  verbose = FALSE,
  msize = 100,
  mask = NULL,
  GRMlist = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  tol = 0.001,
  nit = 100,
  nburn = 0,
  nit_local = NULL,
  nit_global = NULL,
  method = "mixed",
  algorithm = "mcmc"
)

Arguments

y: is a vector or matrix of phenotypes
X: is a matrix of covariates
W: is a matrix of centered and scaled genotypes
stat: dataframe with marker summary statistics
covs: is a list of summary statistics (output from internal cvs function)
trait: is an integer used for selection traits in covs object
fit: is a list of results from gbayes
Glist: list of information about genotype matrix stored on disk
chr: is the chromosome for which to fit BLR models
rsids: is a character vector of rsids
b: is a vector or matrix of marginal marker effects
bm: is a vector or matrix of adjusted marker effects for the BLR model
seb: is a vector or matrix of standard error of marginal effects
LD: is a list with sparse LD matrices
n: is a scalar or vector of number of observations for each trait
formatLD: is a character specifying LD format (formatLD="dense" is default)
vg: is a scalar or matrix of genetic (co)variances
vb: is a scalar or matrix of marker (co)variances
ve: is a scalar or matrix of residual (co)variances
ssg_prior: is a scalar or matrix of prior genetic (co)variances
ssb_prior: is a scalar or matrix of prior marker (co)variances
sse_prior: is a scalar or matrix of prior residual (co)variances
lambda: is a vector or matrix of lambda values
scaleY: is a logical; if TRUE y is centered and scaled
h2: is the trait heritability
pi: is the proportion of markers in each marker variance class (e.g. pi=c(0.999,0.001),used if method="ssvs")
updateB: is a logical for updating marker (co)variances
updateG: is a logical for updating genetic (co)variances
updateE: is a logical for updating residual (co)variances
updatePi: is a logical for updating pi
adjustE: is a logical for adjusting residual variance
models: is a list structure with models evaluated in bayesC
nug: is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
nub: is a scalar or vector of prior degrees of freedom for marker (co)variances
nue: is a scalar or vector of prior degrees of freedom for prior residual (co)variances
verbose: is a logical; if TRUE it prints more details during iteration
msize: number of markers used in compuation of sparseld
mask: is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
GRMlist: is a list providing information about GRM matrix stored in binary files on disk
ve_prior: is a scalar or matrix of prior residual (co)variances
vg_prior: is a scalar or matrix of prior genetic (co)variances
tol: is tolerance, i.e. convergence criteria used in gbayes
nit: is the number of iterations
nburn: is the number of burnin iterations
nit_local: is the number of local iterations
nit_global: is the number of global iterations
method: specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
algorithm: specifies the algorithm

Value

Returns a list structure including

b: vector or matrix (mxt) of posterior means for marker effects
d: vector or matrix (mxt) of posterior means for marker inclusion probabilities
vb: scalar or vector (t) of posterior means for marker variances
vg: scalar or vector (t) of posterior means for genomic variances
ve: scalar or vector (t) of posterior means for residual variances
rb: matrix (txt) of posterior means for marker correlations
rg: matrix (txt) of posterior means for genomic correlations
re: matrix (txt) of posterior means for residual correlations
pi: vector (1xnmodels) of posterior probabilities for models
h2: vector (1xt) of posterior means for model probability
param: a list current parameters (same information as item listed above) used for restart of the analysis
stat: matrix (mxt) of marker information and effects used for genomic risk scoring

Author

Peter Sørensen

Examples



# Simulate data and test functions

W <- matrix(rnorm(100000),nrow=1000)
set1 <- sample(1:ncol(W),5)
set2 <- sample(1:ncol(W),5)
sets <- list(set1,set2)
g <- rowSums(W[,c(set1,set2)])
e <- rnorm(nrow(W),mean=0,sd=1)
y <- g + e


fitM <- gbayes(y=y, W=W, method="bayesN")
fitA <- gbayes(y=y, W=W, method="bayesA")
fitL <- gbayes(y=y, W=W, method="bayesL")
fitC <- gbayes(y=y, W=W, method="bayesC")