Linear regression (NMSA407)

Arnošt Komárek

Subpages

Home (CZ) | Teaching (CZ) | NMST552 |

Teaching winter

NMSA407 | NMST431 |

Teaching summer

NMST432 | NMST440 |

Teaching, software

Rko (CZ) |

Theses

Diploma theses (CZ) | Bachelor theses (CZ) |

Linear regression (NMSA407)

Winter semester 2021–22

SIS pages of the course:    ENG    CZE

TIMETABLE

Lectures: Tuesday 11:30 in K1   
Tuesday 13:10 in K1   
Exercise class (SN): Tuesday 17:20 in K4    (Mgr. Stanislav Nagy, Ph.D.)
Exercise class (MM1): Thursday 15:40 in K4    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (MM2): Thursday 17:20 in K11    (RNDr. Matúš Maciak, Ph.D.)
  • Language of both lectures and all exercise classes is English if and only if there is at least one student being enrolled in English variant of either the Probability, mathematical statistics and econometrics or the Financial and insurance mathematics study branch or if there is incoming student who was approved by the guarantor of either of those two study branches.
  • Personal communication with the lecturer and the exercise class instructors can always be conducted also in Czech or Slovak.

EXAM

  • If allowed by January/February 2022 COVID regulations, physical presence of students in a lecture room of the Karlín building of MFF UK will be required for exam. On-line examinations would only be conducted if this would not be possible or in individual cases worthy of special treatment upon mutual agreement with the lecturer.
  • Exam grade will be based on two parts:
    1. Written part composed of theoretical and semi-practical assignments (no computer analyzis).
    2. Oral part.
    For details, see this document (pdf).
  • Video summary of the whole semester (from 2020-21, only minor changes as compared to the current ac. year) and additional information on exam (partly specific for the previous academic year 2020-21) is available here (Stream, 81 min).

  • Sample exam assignment: Assignment (pdf)    Solution (pdf)    Comments (Stream, 25 min)

All exams take place between January 10 and February 11, 2022. The following exam dates have been open for enrollment in SIS:
  • Thursday 13/01
  • Thursday 20/01
  • Tuesday 25/01
  • Tuesday 01/02
  • Monday 07/02
Capacity of each of the exam terms is 20. Oral part of the exam takes place in the afternoon of the same day.

MATERIALS

If allowed by PANdemIC measures, the lecture proceeds in a lecture room by combination of slides projection and blackboard writing. More information will be provided during the first lecture. The slides and the course notes ("skripta") for the whole semester are available below. Nevertheless, do not print (if you want to print it) too many pages in advance. Both slides and notes are subject to (smaller) changes and/or corrections during the semester without further notice.

Notes (PDF)    last update 02/09/2021
Slides (PDF)    last update 02/09/2021

RECORDINGS

Recordings contain plus/minus the same information as in-person lectures and can be used as either supplement or occasional replacement of in-person lectures.

MP4 can be downloaded and played offline, nevertheless, each file will be removed from here at some point (not earlier than at the beginning of the week following the week for which the respective piece is intended). Stream is provided by stream.cuni.cz and will require SIS login, the link remains working till at least the end of semester.

WEEK 1 (04/10 – 10/10)
Handnotes 1:     PDF       Video 1 (75 min):    Stream    
Handnotes 2/1:  PDF       Video 2/1 (61 min):    Stream    
 
WEEK 2 (11/10 – 18/10)
Handnotes 2/2:  PDF       Video 2/2 (68 min):    Stream    
Handnotes 2/3:  PDF       Video 2/3 (40 min):    Stream    
Handnotes 3:     PDF       Video 3 (60 min):    Stream    
 
WEEK 3 (19/10 – 24/10)
Handnotes 4/1:  PDF       Video 4/1 (43 min):    Stream    
Handnotes 4/2:  PDF       Video 4/2 (112 min):    Stream    
Handnotes 4/3:  PDF       Video 4/3 (105 min):    Stream    
 
WEEK 4 (25/10 – 31/10)
Handnotes 5/1:  PDF       Video 5/1 (36 min):    Stream    
Handnotes 5/2:  PDF       Video 5/2 (68 min):    Stream    
 
WEEK 5 (01/11 – 07/11)
Handnotes 5/4:  PDF       Video 5/4 (105 min):    Stream    
Handnotes 6/1:  PDF       Video 6/1 (88 min):    Stream    
Handnotes 6/2:  PDF       Video 6/2 (49 min):    Stream    
 
WEEK 6 (08/11 – 14/11)
Handnotes 7:     PDF       Video 7 (71 min):    Stream    
Handnotes 5/3:  PDF       Video 5/3 (58 min):    Stream    
 
WEEK 7 (15/11 – 21/11)
Handnotes 8/1:  PDF       Video 8/1 (70 min):    Stream    
Handnotes 8/2:  PDF       Video 8/2 (65 min):    Stream    
Handnotes 5/5:  PDF       Video 5/5 (115 min):    Stream    
 
WEEK 8 (22/11 – 28/11)
Handnotes 9/1:  PDF       Video 9/1 (35 min):    Stream    
Handnotes 9/2:  PDF       Video 9/2 (93 min):    Stream    
Handnotes 9/3:  PDF       Video 9/3 (87 min):    Stream    
 
WEEK 9 (29/11 – 05/12)
Handnotes 9/4:    PDF       Video 9/4 (26 min):    Stream    
Handnotes 10/1:  PDF       Video 10/1 (99 min):    Stream    
Handnotes 10/2:  PDF       Video 10/2 (110 min):    Stream    
 
WEEK 10 (06/12 – 12/12)
Handnotes 11/1:  PDF       Video 11/1 (95 min):    Stream    MP4
Handnotes 11/2:  PDF       Video 11/2 (62 min):    Stream    MP4
Handnotes 12/1:  PDF       Video 12 (54 min):    Stream    MP4
Handnotes 12/2:  PDF               
Supplement to 12:       R script    Data (RData)    Data description (PDF)
         Video, part 1 (69 min):    Stream    MP4
         Video, part 2 (73 min):    Stream    MP4
 
WEEK 11 (13/12 – 19/12)
Handnotes 14/1:  PDF       Video 14/1 (67 min):    Stream    MP4
Handnotes 14/2:  PDF       Video 14/2 (66 min):    Stream    MP4
Handnotes 14/3:  PDF       Video 14/3 (53 min):    Stream    MP4
 
WEEK 12 (20/12 – 26/12)
Handnotes 16/1:  PDF       Video 16/1 (75 min):    Stream    MP4
Handnotes 16/2:  PDF       Video 16/2 (42 min):    Stream    MP4
Handnotes 16/3:  PDF       Video 16/3 (89 min):    Stream    MP4
 
WEEK 13 (03/01 – 09/01)
Piece 13/1 contains mostly repetition of materials covered by previous lectures
as well as those covered by the Mathematical Statistics 1 course.
Handnotes 13/1:  PDF       Video 13/1 (40 min):    Stream    MP4
Handnotes 13/2:  PDF       Video 13/2 (92 min):    Stream    MP4
 

ENTRY REQUIREMENTS

This course closely follows the bachelor study branch General Mathematics and especially its subbranch Stochastics. The course hence builds upon decent knowledge of a classical mathematical thinking (theorem, proof, ...), knowledge acquired during very basic courses (mathematical analysis, linear algebra, ...) and also on intermediate knowledge of probability theory and mathematical statistics. The most important areas of general mathematics and mathematical statistics which are unavoidable to be able to follow this course include:

  • Vector spaces, matrix calculus;
  • Probability space, conditional probability, conditional distribution, conditional expectation;
  • Elementary asymptotic results (laws of large numbers, central limit theorem for i.i.d. random variables and vectors, Cramér-Wold theorem, Cramér-Slutsky theorem);
  • Foundations of statistical inference (statistical test, confidence interval, standard error, consistency);
  • Basic procedures of statistical inference (asymptotic tests on expected value, one- and two-sample t-test, one-way analysis of variance, chi-square test of independence);
  • Maximum-likelihood theory including asymptotic results and the delta method;
  • Working knowledge of R, a free software environment for statistical computing and graphics (R).

This course is not a cook-book course on linear regression and it does not make much sense to follow it without having a knowledge described above.

EXERCISE CLASSES

All information related to the exercise classes is (will be) available at the central exercise classes webpage.

Exercise classes are synchronized. Content of the classes held in the same week is approximately the same.

SUPPLEMENTARY R PACKAGE

The course is supplemented by the R package mffSM which contains example datasets used throughout the course and few additional small functions related to processing of the linear model fit. Upon download (from the link below, not from CRAN), the package can be installed in R in a standard way (``from a local repository''). Windows binary file is intended for the MS Windows users (as the title suggests), the source code is intended for users of other (mostly more reliable) operating systems where it is a standard to compile the package from its source (Linux, Mac etc.). The mffSM package no more depends on packages colorspace, lattice, car, which are available in a standard way from CRAN. All those dependency packages should normally be automatically installed if the installation of the mffSM package is performed directly from the R console on an Internet-connected computer using the command (its appropriately modified analogy):

install.packages("PATH_WHERE_DOWNLOADED/mffSM_1.2.[tar.gz,zip]", repos = NULL)

Source code:   mffSM_1.2.tar.gz
Windows binary:   mffSM_1.2.zip
 

R TUTORIALS

R tutorials show the R analyses that are based on theory given during the lectures. They also provide the code used to prepare majority of the output/plots that is used during the lectures as illustrations. The R tutorials may serve as a reference for the assignments performed during the exercise classes or required in homeworks.

The R scripts provided below assume that the content of the .Rprofile is sourced at start.

-
1. Linear Model
  1. Simple illustration of a linear model (data Hosi0)    html    R code
 
2. Least Squares Estimation
  1. Matrix algebra background of linear regression    html    R code
  2. R function lm    html    R code
 
3. Basic Regression Diagnostics
  1. Basic Regression Diagnostics (data Cars2004nh)    html    R code
 
4. Parameterizations of Covariates
  1. Numeric covariate: simple transformation, polynomial regression, regression splines (data Houses1987)    html    R code
  2. Numeric covariate: regression splines (data Motorcycle)    html    R code
  3. Categorical nominal covariate (data Cars2004nh)    html    R code
  4. Categorical ordinal covariate (data Cars2004nh)    html    R code
 
5. Multiple Regression
  1. Numeric and categorical covariate (data Cars2004nh)    html    R code
  2. Two numeric covariates (data Cars2004nh)    html    R code
  3. Two categorical covariates (data Howells)    html    R code
  4. Multiple regression model (data Cars2004nh)    html    R code
  5. ANOVA tables of type I, II and III (data Cars2004nh)    html    R code
 
6. Normal Linear Model
  1. Inference in a model with the regression line (data Cars2004nh)    html    R code
  2. Joint inference on a vector of estimable parameters (data Cars2004nh)    html    R code
  3. Confidence interval for the model based mean, prediction interval (data Hosi0)    html    R code
  4. Confidence interval for the model based mean, prediction interval (data Kojeni)    html    R code
 
9. Checking Model Assumptions
  1. Partial residuals, Simpson's paradox (data Policie)    html    R code
  2. Partial residuals (data Cars2004nh)    html    R code
  3. Residual plots and tests on assumptions (data Cars2004nh)    html    R code
  4. Checking homoscedasticity (data Draha)    html    R code
  5. Checking uncorrelated errors (data Olympic)    html    R code
  6. Transformation of response: ANOVA with log-transformed response    html    R code
      to get normality and homoscedasticity (data Houses1987)
  7. Transformation of response: Regression with log-transformed response    html    R code
      to stabilize the variance, Box–Cox transformation (data Cars2004nh)
 
10. Problematic Regression Space
  1. Multicollinearity (data IQ)    html    R code
  2. Multicollinearity (data Cars2004nh)    html    R code
 
11. Unusual Observations
  1. Unusual observations (data Cars2004)    html    R code
 
14. Simultaneous Inference in a Linear Model
  1. Multiple comparison procedures (Tukey, Hothorn–Bretz–Westfall) (data Howells)    html    R code
  2. Multiple comparison procedures (Hothorn–Bretz–Westfall) (data Cars2004nh)    html    R code
  3. Confidence band around and for the regression function (data Kojeni)    html    R code
 
15. General Linear Model
  1. Weighted least squares (data Kojeni and wKojeni)    html    R code
 

 

View My Stats