Skip to contents

Extracts specific rows (based on ids or row numbers) and columns (based on rsids or column numbers) from a genotype matrix stored on disk. The extraction is based on provided arguments such as chromosome number, ids, rsids, etc. Genotypes can be optionally scaled and imputed.

Usage

getG(
  Glist = NULL,
  chr = NULL,
  bedfiles = NULL,
  bimfiles = NULL,
  famfiles = NULL,
  ids = NULL,
  rsids = NULL,
  rws = NULL,
  cls = NULL,
  impute = TRUE,
  scale = FALSE
)

Arguments

Glist

A list structure containing information about genotypes stored on disk.

chr

An integer representing the chromosome for which the genotype matrix is to be extracted. It is required.

bedfiles

A vector of filenames for the PLINK bed-file.

bimfiles

A vector of filenames for the PLINK bim-file.

famfiles

A vector of filenames for the PLINK fam-file.

ids

A vector of individual IDs for whom the genotype data needs to be extracted.

rsids

A vector of SNP identifiers for which the genotype data needs to be extracted.

rws

A vector of row numbers to be extracted from the genotype matrix.

cls

A vector of column numbers to be extracted from the genotype matrix.

impute

A logical or integer. If TRUE, missing genotypes are replaced with their expected values (2 times the allele frequency). If set to an integer, missing values are replaced by that integer.

scale

A logical. If TRUE, the genotype markers are scaled to have a mean of zero and variance of one.

Value

A matrix with extracted genotypic data. Rows correspond to individuals, and columns correspond to SNPs. Row names are set to individual IDs, and column names are set to rsids.

Details

This function facilitates the extraction of specific genotype data from storage based on various criteria. The extracted genotype data can be optionally scaled or imputed. If rsids are provided that are not found in the `Glist`, a warning is raised.