Linear regression (NMSA407)

Arnošt Komárek


Home (CZ) | Teaching (CZ) | NMST552 |

Teaching winter

NMSA407 | NMST431 |

Teaching summer

NMST432 | NMST440 |

Teaching, software

Rko (CZ) |


Diploma theses (CZ) | Bachelor theses (CZ) |

Linear regression (NMSA407)

Winter semester 2021–22

SIS pages of the course:    ENG    CZE


Lectures: Tuesday 11:30 in K1   
Tuesday 13:10 in K1   
Exercise class (SN): Tuesday 17:20 in K4    (Mgr. Stanislav Nagy, Ph.D.)
Exercise class (MM1): Thursday 15:40 in K4    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (MM2): Thursday 17:20 in K11    (RNDr. Matúš Maciak, Ph.D.)
  • Language of both lectures and all exercise classes is English if and only if there is at least one student being enrolled in English variant of either the Probability, mathematical statistics and econometrics or the Financial and insurance mathematics study branch or if there is incoming student who was approved by the guarantor of either of those two study branches.
  • Personal communication with the lecturer and the exercise class instructors can always be conducted also in Czech or Slovak.


If allowed by PANdemIC measures, the lecture proceeds in a lecture room by combination of slides projection and blackboard writing. More information will be provided during the first lecture. The slides and the course notes ("skripta") for the whole semester are available below. Nevertheless, do not print (if you want to print it) too many pages in advance. Both slides and notes are subject to (smaller) changes and/or corrections during the semester without further notice.

Notes (PDF)    last update 02/09/2021
Slides (PDF)    last update 02/09/2021


This course closely follows the bachelor study branch General Mathematics and especially its subbranch Stochastics. The course hence builds upon decent knowledge of a classical mathematical thinking (theorem, proof, ...), knowledge acquired during very basic courses (mathematical analysis, linear algebra, ...) and also on intermediate knowledge of probability theory and mathematical statistics. The most important areas of general mathematics and mathematical statistics which are unavoidable to be able to follow this course include:

  • Vector spaces, matrix calculus;
  • Probability space, conditional probability, conditional distribution, conditional expectation;
  • Elementary asymptotic results (laws of large numbers, central limit theorem for i.i.d. random variables and vectors, Cramér-Wold theorem, Cramér-Slutsky theorem);
  • Foundations of statistical inference (statistical test, confidence interval, standard error, consistency);
  • Basic procedures of statistical inference (asymptotic tests on expected value, one- and two-sample t-test, one-way analysis of variance, chi-square test of independence);
  • Maximum-likelihood theory including asymptotic results and the delta method;
  • Working knowledge of R, a free software environment for statistical computing and graphics (R).

This course is not a cook-book course on linear regression and it does not make much sense to follow it without having a knowledge described above.


  • Exam grade will be based on two parts:
    1. Written part composed of theoretical and semi-practical assignments (no computer analyzis).
    2. Oral part.
All exams took place between January 10 and February 11, 2022. There will be at least five opportunities to take an exam spread over this period. There will be no exam dates later on.


All information related to the exercise classes is (will be) available at the central exercise classes webpage.

Exercise classes are synchronized. Content of the classes held in the same week is approximately the same.


The course is supplemented by the R package mffSM which contains example datasets used throughout the course and few additional small functions related to processing of the linear model fit. Upon download (from the link below, not from CRAN), the package can be installed in R in a standard way (``from a local repository''). Windows binary file is intended for the MS Windows users (as the title suggests), the source code is intended for users of other (mostly more reliable) operating systems where it is a standard to compile the package from its source (Linux, Mac etc.). The mffSM package depends on packages colorspace, lattice, car, which are available in a standard way from CRAN. All those dependency packages should normally be automatically installed if the installation of the mffSM package is performed directly from the R console on an Internet-connected computer using the command (its appropriately modified analogy):

install.packages("PATH_WHERE_DOWNLOADED/mffSM_1.1.[tar.gz,zip]", repos = NULL)

Source code:   mffSM_1.1.tar.gz
Windows binary:


R tutorials show the R analyses that are based on theory given during the lectures. They also provide the code used to prepare majority of the output/plots that is used during the lectures as illustrations. The R tutorials may serve as a reference for the assignments performed during the exercise classes or required in homeworks.

The R scripts provided below assume that the content of the .Rprofile is sourced at start.

1. Linear Model
  1. Simple illustration of a linear model (data Hosi0)    html    R code
2. Least Squares Estimation
  1. Matrix algebra background of linear regression    html    R code
  2. R function lm    html    R code
3. Basic Regression Diagnostics
  1. Basic Regression Diagnostics (data Cars2004nh)    html    R code
4. Parameterizations of Covariates
  1. Numeric covariate: simple transformation, polynomial regression, regression splines (data Houses1987)    html    R code
  2. Numeric covariate: regression splines (data Motorcycle)    html    R code
  3. Categorical nominal covariate (data Cars2004nh)    html    R code
  4. Categorical ordinal covariate (data Cars2004nh)    html    R code
5. Multiple Regression
  1. Numeric and categorical covariate (data Cars2004nh)    html    R code
  2. Two numeric covariates (data Cars2004nh)    html    R code
  3. Two categorical covariates (data Howells)    html    R code
  4. Multiple regression model (data Cars2004nh)    html    R code
  5. ANOVA tables of type I, II and III (data Cars2004nh)    html    R code
6. Normal Linear Model
  1. Inference in a model with the regression line (data Cars2004nh)    html    R code
  2. Joint inference on a vector of estimable parameters (data Cars2004nh)    html    R code
  3. Confidence interval for the model based mean, prediction interval (data Hosi0)    html    R code
  4. Confidence interval for the model based mean, prediction interval (data Kojeni)    html    R code
9. Checking Model Assumptions
  1. Partial residuals, Simpson's paradox (data Policie)    html    R code
  2. Partial residuals (data Cars2004nh)    html    R code
  3. Residual plots and tests on assumptions (data Cars2004nh)    html    R code
  4. Checking homoscedasticity (data Draha)    html    R code
  5. Checking uncorrelated errors (data Olympic)    html    R code
  6. Transformation of response: ANOVA with log-transformed response    html    R code
      to get normality and homoscedasticity (data Houses1987)
  7. Transformation of response: Regression with log-transformed response    html    R code
      to stabilize the variance, Box–Cox transformation (data Cars2004nh)
10. Problematic Regression Space
  1. Multicollinearity (data IQ)    html    R code
  2. Multicollinearity (data Cars2004nh)    html    R code
11. Unusual Observations
  1. Unusual observations (data Cars2004)    html    R code
14. Simultaneous Inference in a Linear Model
  1. Multiple comparison procedures (Tukey, Hothorn–Bretz–Westfall) (data Howells)    html    R code
  2. Multiple comparison procedures (Hothorn–Bretz–Westfall) (data Cars2004nh)    html    R code
  3. Confidence band around and for the regression function (data Kojeni)    html    R code
15. General Linear Model
  1. Weighted least squares (data Kojeni and wKojeni)    html    R code


View My Stats