Skip to contents

Bayesian linear regression (BLR) models:

- unified mapping of genetic variants, estimation of genetic parameters (e.g. heritability) and prediction of disease risk)

- handles different genetic architectures (few large, many small effects)

- scale to large data (e.g. sparse LD)

In the Bayesian multiple regression model the posterior density of the model parameters depend on the likelihood of the data given the parameters and a prior probability for the model parameters

The prior density of marker effects defines whether the model will induce variable selection and shrinkage or shrinkage only. Also, the choice of prior will define the extent and type of shrinkage induced. Ideally the choice of prior for the marker effect should reflect the genetic architecture of the trait, and will vary (perhaps a lot) across traits.

The following prior distributions are provided:

Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).

Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO

Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.

Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).

Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .

Usage

gbayes(
  y = NULL,
  X = NULL,
  W = NULL,
  stat = NULL,
  covs = NULL,
  trait = NULL,
  fit = NULL,
  Glist = NULL,
  chr = NULL,
  rsids = NULL,
  b = NULL,
  bm = NULL,
  seb = NULL,
  LD = NULL,
  n = NULL,
  formatLD = "dense",
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  scaleY = TRUE,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  models = NULL,
  nug = 4,
  nub = 4,
  nue = 4,
  verbose = FALSE,
  msize = 100,
  mask = NULL,
  GRMlist = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  tol = 0.001,
  nit = 100,
  nburn = 0,
  nit_local = NULL,
  nit_global = NULL,
  method = "mixed",
  algorithm = "mcmc"
)

Arguments

y

is a vector or matrix of phenotypes

X

is a matrix of covariates

W

is a matrix of centered and scaled genotypes

stat

dataframe with marker summary statistics

covs

is a list of summary statistics (output from internal cvs function)

trait

is an integer used for selection traits in covs object

fit

is a list of results from gbayes

Glist

list of information about genotype matrix stored on disk

chr

is the chromosome for which to fit BLR models

rsids

is a character vector of rsids

b

is a vector or matrix of marginal marker effects

bm

is a vector or matrix of adjusted marker effects for the BLR model

seb

is a vector or matrix of standard error of marginal effects

LD

is a list with sparse LD matrices

n

is a scalar or vector of number of observations for each trait

formatLD

is a character specifying LD format (formatLD="dense" is default)

vg

is a scalar or matrix of genetic (co)variances

vb

is a scalar or matrix of marker (co)variances

ve

is a scalar or matrix of residual (co)variances

ssg_prior

is a scalar or matrix of prior genetic (co)variances

ssb_prior

is a scalar or matrix of prior marker (co)variances

sse_prior

is a scalar or matrix of prior residual (co)variances

lambda

is a vector or matrix of lambda values

scaleY

is a logical; if TRUE y is centered and scaled

h2

is the trait heritability

pi

is the proportion of markers in each marker variance class (e.g. pi=c(0.999,0.001),used if method="ssvs")

updateB

is a logical for updating marker (co)variances

updateG

is a logical for updating genetic (co)variances

updateE

is a logical for updating residual (co)variances

updatePi

is a logical for updating pi

adjustE

is a logical for adjusting residual variance

models

is a list structure with models evaluated in bayesC

nug

is a scalar or vector of prior degrees of freedom for prior genetic (co)variances

nub

is a scalar or vector of prior degrees of freedom for marker (co)variances

nue

is a scalar or vector of prior degrees of freedom for prior residual (co)variances

verbose

is a logical; if TRUE it prints more details during iteration

msize

number of markers used in compuation of sparseld

mask

is a vector or matrix of TRUE/FALSE specifying if marker should be ignored

GRMlist

is a list providing information about GRM matrix stored in binary files on disk

ve_prior

is a scalar or matrix of prior residual (co)variances

vg_prior

is a scalar or matrix of prior genetic (co)variances

tol

is tolerance, i.e. convergence criteria used in gbayes

nit

is the number of iterations

nburn

is the number of burnin iterations

nit_local

is the number of local iterations

nit_global

is the number of global iterations

method

specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")

algorithm

specifies the algorithm

Value

Returns a list structure including

b

vector or matrix (mxt) of posterior means for marker effects

d

vector or matrix (mxt) of posterior means for marker inclusion probabilities

vb

scalar or vector (t) of posterior means for marker variances

vg

scalar or vector (t) of posterior means for genomic variances

ve

scalar or vector (t) of posterior means for residual variances

rb

matrix (txt) of posterior means for marker correlations

rg

matrix (txt) of posterior means for genomic correlations

re

matrix (txt) of posterior means for residual correlations

pi

vector (1xnmodels) of posterior probabilities for models

h2

vector (1xt) of posterior means for model probability

param

a list current parameters (same information as item listed above) used for restart of the analysis

stat

matrix (mxt) of marker information and effects used for genomic risk scoring

Author

Peter Sørensen

Examples



# Simulate data and test functions

W <- matrix(rnorm(100000),nrow=1000)
set1 <- sample(1:ncol(W),5)
set2 <- sample(1:ncol(W),5)
sets <- list(set1,set2)
g <- rowSums(W[,c(set1,set2)])
e <- rnorm(nrow(W),mean=0,sd=1)
y <- g + e


fitM <- gbayes(y=y, W=W, method="bayesN")
fitA <- gbayes(y=y, W=W, method="bayesA")
fitL <- gbayes(y=y, W=W, method="bayesL")
fitC <- gbayes(y=y, W=W, method="bayesC")