This document demonstrates how to simulate data from a classical linear regression model and estimate parameters using Ordinary Least Squares (OLS), including inference and prediction intervals.
We also visualize the estimated vs. true regression coefficients.
R Code
Code
# ============================================================# Classical Linear Regression: Simulation + OLS Estimation# ============================================================set.seed(123)# --- 1. Simulate data ---------------------------------------n <-100# number of observationsp <-3# number of predictors (excluding intercept)X <-cbind(1, matrix(rnorm(n * p), n, p)) # include interceptbeta_true <-c(2, 0.5, -1, 1.5) # true coefficientssigma_true <-1# true residual SDy <- X %*% beta_true +rnorm(n, 0, sigma_true)# --- 2. Estimate parameters via OLS --------------------------beta_hat <-solve(t(X) %*% X) %*%t(X) %*% yresiduals <- y - X %*% beta_hatsigma2_hat <-as.numeric(t(residuals) %*% residuals / (n - (p +1)))# --- 3. Inference: SEs, t-stats, p-values, CI ---------------var_beta_hat <- sigma2_hat *solve(t(X) %*% X)se_beta <-sqrt(diag(var_beta_hat))t_stats <- beta_hat / se_betap_values <-2* (1-pt(abs(t_stats), df = n - (p +1)))alpha <-0.05t_crit <-qt(1- alpha/2, df = n - (p +1))ci_lower <- beta_hat - t_crit * se_betaci_upper <- beta_hat + t_crit * se_beta# Combine results into a tableresults <-data.frame(Estimate =as.numeric(beta_hat),SE = se_beta,t_value =as.numeric(t_stats),p_value =as.numeric(p_values),CI_lower =as.numeric(ci_lower),CI_upper =as.numeric(ci_upper))rownames(results) <-paste0("beta", 0:p)print(round(results, 4))