Genomic Medicine, Department of Health Science and Technology, Aalborg University, Denmark
Peter Sørensen
Center for Quantitative Genetics and Genomics, Aarhus University, Denmark
Introduction
This document demonstrates how to simulate data from a classical linear regression model and estimate parameters using Ordinary Least Squares (OLS), including inference and prediction intervals.
We also visualize the estimated vs. true regression coefficients.
R Code
Code
# ============================================================# Classical Linear Regression: Simulation + OLS Estimation# ============================================================set.seed(123)# --- 1. Simulate data ---------------------------------------n <-100# number of observationsp <-3# number of predictors (excluding intercept)X <-cbind(1, matrix(rnorm(n * p), n, p)) # include interceptbeta_true <-c(2, 0.5, -1, 1.5) # true coefficientssigma_true <-1# true residual SDy <- X %*% beta_true +rnorm(n, 0, sigma_true)# --- 2. Estimate parameters via OLS --------------------------beta_hat <-solve(t(X) %*% X) %*%t(X) %*% yresiduals <- y - X %*% beta_hatsigma2_hat <-as.numeric(t(residuals) %*% residuals / (n - (p +1)))# --- 3. Inference: SEs, t-stats, p-values, CI ---------------var_beta_hat <- sigma2_hat *solve(t(X) %*% X)se_beta <-sqrt(diag(var_beta_hat))t_stats <- beta_hat / se_betap_values <-2* (1-pt(abs(t_stats), df = n - (p +1)))alpha <-0.05t_crit <-qt(1- alpha/2, df = n - (p +1))ci_lower <- beta_hat - t_crit * se_betaci_upper <- beta_hat + t_crit * se_beta# Combine results into a tableresults <-data.frame(Estimate =as.numeric(beta_hat),SE = se_beta,t_value =as.numeric(t_stats),p_value =as.numeric(p_values),CI_lower =as.numeric(ci_lower),CI_upper =as.numeric(ci_upper))rownames(results) <-paste0("beta", 0:p)print(round(results, 4))