The greml function is used for the estimation of genomic parameters (co-variance, heritability and correlation) for linear mixed models using restricted maximum likelihood estimation (REML) and genomic prediction using best linear unbiased prediction (BLUP).
The linear mixed model can account for multiple genetic factors (fixed and random genetic marker effects), adjust for complex family relationships or population stratification and adjust for other non-genetic factors including lifestyle characteristics. Different genetic architectures (infinitesimal, few large and many small effects) is accounted for by modeling genetic markers in different sets as fixed or random effects and by specifying individual genetic marker weights. Different genetic models (e.g. additive and non-additive) can be specified by providing additive and non-additive genomic relationship matrices (GRMs) (constructed using grm). The GRMs can be accessed from the R environment or from binary files stored on disk facilitating the analyses of large-scale genetic data.
The output contains estimates of variance components, fixed and random effects, first and second derivatives of log-likelihood and the asymptotic standard deviation of parameter estimates.
Assessment of predictive accuracy (including correlation and R2, and AUC for binary phenotypes) can be obtained by providing greml with a data frame, or a list that contains sample IDs used in the validation (see examples for details).
Genomic parameters can also be estimated with DMU (http://www.dmu.agrsci.dk/DMU/) if interface =”DMU”. This option requires DMU to be installed locally, and the path to the DMU binary files has to be specified (see examples below for details).
Usage
greml(
y = NULL,
X = NULL,
GRMlist = NULL,
GRM = NULL,
theta = NULL,
ids = NULL,
validate = NULL,
maxit = 100,
tol = 1e-05,
bin = NULL,
ncores = 1,
wkdir = getwd(),
verbose = FALSE,
interface = "R",
fm = NULL,
data = NULL
)
Arguments
- y
is a vector or matrix of phenotypes
- X
is a design matrix for factors modeled as fixed effects
- GRMlist
is a list providing information about GRM matrix stored in binary files on disk
- GRM
is a list of one or more genomic relationship matrices
- theta
is a vector of initial values of co-variance for REML estimation
- ids
is a vector of individuals used in the analysis
- validate
is a data frame or list of individuals used in cross-validation (one column/row for each validation set)
- maxit
is the maximum number of iterations used in REML analysis
- tol
is tolerance, i.e. convergence criteria used in REML
- bin
is the directory for fortran binaries (e.g. DMU binaries dmu1 and dmuai)
- ncores
is the number of cores used for the analysis
- wkdir
is the working directory used for REML
- verbose
is a logical; if TRUE it prints more details during optimization
- interface
is used for specifying whether to use R or Fortran implementations of REML
- fm
is a formula with model statement for the linear mixed model
- data
is a data frame containing the phenotypic observations and fixed factors specified in the model statements
Value
returns a list structure including:
- llik
log-likelihood at convergence
- theta
covariance estimates from REML
- asd
asymptotic standard deviation
- b
vector of fixed effect estimates
- varb
vector of variances of fixed effect estimates
- g
vector or matrix of random effect estimates
- e
vector or matrix of residual effects
- accuracy
matrix of prediction accuracies (only returned if [validate?] is provided)
References
Lee, S. H., & van der Werf, J. H. (2006). An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genetics Selection Evolution, 38(1), 25.
Examples
# \donttest{
# Simulate data
W <- matrix(rnorm(1000000), ncol = 1000)
colnames(W) <- as.character(1:ncol(W))
rownames(W) <- as.character(1:nrow(W))
y <- rowSums(W[, 1:10]) + rowSums(W[, 501:510]) + rnorm(nrow(W))
# Create model
data <- data.frame(y = y, mu = 1)
fm <- y ~ 0 + mu
X <- model.matrix(fm, data = data)
# Compute GRM
GRM <- grm(W = W)
# REML analyses
fitG <- greml(y = y, X = X, GRM = list(GRM))
# REML analyses and cross validation
# Create marker sets
setsGB <- list(A = colnames(W)) # gblup model
setsGF <- list(C1 = colnames(W)[1:500], C2 = colnames(W)[501:1000]) # gfblup model
setsGT <- list(C1 = colnames(W)[1:10], C2 = colnames(W)[501:510]) # true model
GB <- lapply(setsGB, function(x) {grm(W = W[, x])})
GF <- lapply(setsGF, function(x) {grm(W = W[, x])})
GT <- lapply(setsGT, function(x) {grm(W = W[, x])})
n <- length(y)
fold <- 10
nvalid <- 5
validate <- replicate(nvalid, sample(1:n, as.integer(n / fold)))
cvGB <- greml(y = y, X = X, GRM = GB, validate = validate)
cvGF <- greml(y = y, X = X, GRM = GF, validate = validate)
cvGT <- greml(y = y, X = X, GRM = GT, validate = validate)
cvGB$accuracy
cvGF$accuracy
cvGT$accuracy
# }