Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Advanced Regression Modelling

Advanced Statistical Data Analysis

Lecture Notes

ZHAW School of Engineering

Review of Multiple Linear Regression

Initial Remarks

Regression analysis is used to model the relationship between a response variable YY and one or more explanatory variables x(1),,x(m)x^{(1)}, \dots, x^{(m)}, where the relationship is masked by random noise.

Objectives of Regression Analysis

  1. General description of data structure.

  2. Assessment of the effect of explanatory variables on the response.

  3. Prediction of future observations.

Error Assumptions

The standard assumptions for the error terms Ei\mathcal{E}_i are:

Matrix Representation

To simplify notation, the regression equation Definition 1 is written in matrix form:

Y=Xβ+E\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\mathcal{E}}

where:

Tukey’s First-Aid Transformations

Standard recommendations used to linearize relationships and stabilize variance when there is no specific domain theory to guide variable transformation. These should be applied to both explanatory variables and responses unless a valid reason exists to do otherwise:

Data TypeRecommended Transformation
Concentrations and Amountslog(x)\log(x)
Count Datax\sqrt{x}
Counted Fractions / Sharesx~=logit(x)=log(x+0.0051.01x)\tilde{x} = \text{logit}(x) = \log\left(\frac{x + 0.005}{1.01 - x}\right)

Model Fitting and Diagnostics

Least Squares Estimation

The coefficients β\boldsymbol{\beta} are estimated by minimizing the sum of squared residuals.

The OLS estimator is given by:

β^=(XTX)1XTY\hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}

Model Adequacy (Residual Analysis)

Model adequacy is checked using diagnostic plots: