Bayesian linear regression (BLR) models:
- unified mapping of genetic variants, estimation of genetic parameters (e.g. heritability) and prediction of disease risk)
- handles different genetic architectures (few large, many small effects)
- scale to large data (e.g. sparse LD)
In the Bayesian multiple regression model the posterior density of the model parameters depend on the likelihood of the data given the parameters and a prior probability for the model parameters
The prior density of marker effects defines whether the model will induce variable selection and shrinkage or shrinkage only. Also, the choice of prior will define the extent and type of shrinkage induced. Ideally the choice of prior for the marker effect should reflect the genetic architecture of the trait, and will vary (perhaps a lot) across traits.
The following prior distributions are provided:
Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).
Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO
Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.
Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).
Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .
Usage
gbayes(
y = NULL,
X = NULL,
W = NULL,
stat = NULL,
covs = NULL,
trait = NULL,
fit = NULL,
Glist = NULL,
chr = NULL,
rsids = NULL,
b = NULL,
bm = NULL,
seb = NULL,
LD = NULL,
n = NULL,
formatLD = "dense",
vg = NULL,
vb = NULL,
ve = NULL,
ssg_prior = NULL,
ssb_prior = NULL,
sse_prior = NULL,
lambda = NULL,
scaleY = TRUE,
h2 = NULL,
pi = 0.001,
updateB = TRUE,
updateG = TRUE,
updateE = TRUE,
updatePi = TRUE,
adjustE = TRUE,
models = NULL,
nug = 4,
nub = 4,
nue = 4,
verbose = FALSE,
msize = 100,
mask = NULL,
GRMlist = NULL,
ve_prior = NULL,
vg_prior = NULL,
tol = 0.001,
nit = 100,
nburn = 0,
nit_local = NULL,
nit_global = NULL,
method = "mixed",
algorithm = "mcmc"
)
Arguments
- y
is a vector or matrix of phenotypes
- X
is a matrix of covariates
- W
is a matrix of centered and scaled genotypes
- stat
dataframe with marker summary statistics
- covs
is a list of summary statistics (output from internal cvs function)
- trait
is an integer used for selection traits in covs object
- fit
is a list of results from gbayes
- Glist
list of information about genotype matrix stored on disk
- chr
is the chromosome for which to fit BLR models
- rsids
is a character vector of rsids
- b
is a vector or matrix of marginal marker effects
- bm
is a vector or matrix of adjusted marker effects for the BLR model
- seb
is a vector or matrix of standard error of marginal effects
- LD
is a list with sparse LD matrices
- n
is a scalar or vector of number of observations for each trait
- formatLD
is a character specifying LD format (formatLD="dense" is default)
- vg
is a scalar or matrix of genetic (co)variances
- vb
is a scalar or matrix of marker (co)variances
- ve
is a scalar or matrix of residual (co)variances
- ssg_prior
is a scalar or matrix of prior genetic (co)variances
- ssb_prior
is a scalar or matrix of prior marker (co)variances
- sse_prior
is a scalar or matrix of prior residual (co)variances
- lambda
is a vector or matrix of lambda values
- scaleY
is a logical; if TRUE y is centered and scaled
- h2
is the trait heritability
- pi
is the proportion of markers in each marker variance class (e.g. pi=c(0.999,0.001),used if method="ssvs")
- updateB
is a logical for updating marker (co)variances
- updateG
is a logical for updating genetic (co)variances
- updateE
is a logical for updating residual (co)variances
- updatePi
is a logical for updating pi
- adjustE
is a logical for adjusting residual variance
- models
is a list structure with models evaluated in bayesC
- nug
is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
- nub
is a scalar or vector of prior degrees of freedom for marker (co)variances
- nue
is a scalar or vector of prior degrees of freedom for prior residual (co)variances
- verbose
is a logical; if TRUE it prints more details during iteration
- msize
number of markers used in compuation of sparseld
- mask
is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
- GRMlist
is a list providing information about GRM matrix stored in binary files on disk
- ve_prior
is a scalar or matrix of prior residual (co)variances
- vg_prior
is a scalar or matrix of prior genetic (co)variances
- tol
is tolerance, i.e. convergence criteria used in gbayes
- nit
is the number of iterations
- nburn
is the number of burnin iterations
- nit_local
is the number of local iterations
- nit_global
is the number of global iterations
- method
specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
- algorithm
specifies the algorithm
Value
Returns a list structure including
- b
vector or matrix (mxt) of posterior means for marker effects
- d
vector or matrix (mxt) of posterior means for marker inclusion probabilities
- vb
scalar or vector (t) of posterior means for marker variances
- vg
scalar or vector (t) of posterior means for genomic variances
- ve
scalar or vector (t) of posterior means for residual variances
- rb
matrix (txt) of posterior means for marker correlations
- rg
matrix (txt) of posterior means for genomic correlations
- re
matrix (txt) of posterior means for residual correlations
- pi
vector (1xnmodels) of posterior probabilities for models
- h2
vector (1xt) of posterior means for model probability
- param
a list current parameters (same information as item listed above) used for restart of the analysis
- stat
matrix (mxt) of marker information and effects used for genomic risk scoring
Examples
# Simulate data and test functions
W <- matrix(rnorm(100000),nrow=1000)
set1 <- sample(1:ncol(W),5)
set2 <- sample(1:ncol(W),5)
sets <- list(set1,set2)
g <- rowSums(W[,c(set1,set2)])
e <- rnorm(nrow(W),mean=0,sd=1)
y <- g + e
fitM <- gbayes(y=y, W=W, method="bayesN")
fitA <- gbayes(y=y, W=W, method="bayesA")
fitL <- gbayes(y=y, W=W, method="bayesL")
fitC <- gbayes(y=y, W=W, method="bayesC")