**(NMSA 407) Linear regression**

**Lectures:**doc. RNDr. Arnošt Komárek, Ph.D.

Lab sessions:
| Tu: 12:20 - 13:50 | @K4 | (lecturer: Matúš Maciak) |

Tu: 14:00 - 15:30 | @K11 | (lecturer: Stanislav Nagy) | |

Th: 09:40 - 10:30 | @K4 | (lecturer: Matúš Maciak) |

**General Information**

Three 'parallel' sessions, all in English language, are taking place in the winter term 2019/2020. Each student attending one of these sessions is expected to be officially enrolled for the corresponding session in SIS. Any exceptions must be discussed and agreed with both lecturers. These three sessions are synchronized in order to cover the same topics and mostly the content of the classes held in the same week will be approximately the same.

For most of the classes the students will be required to work with the provided computers (lecture rooms K4 or K11). Alternatively, each student can use his/her own laptop if the statistical software R is installed on it. Additional R libraries will be needed too, therefore, the students using their own laptops should either have the corresponding libraries already installed on their laptops or the laptops should be connected to internet. The eduroam network is available in both lecture rooms.

- The
**R software**can be obtained free of charge (GNU general public license) on https://www.r-project.org and there are distributions available for Windows, Linux, and Macintosh. - The standard installation contains some basic packages. Additional packages can be downloaded from the CRAN repository and they can be directly installed using the R working environment. More details will be provided during the lectures when such packages are needed.
- Many tutorials in various languages can be either found here or just by simply searching the web.
- For a smooth progress of the lab sessions it is also required to install an additional R package (package
**mffSM**). The package is not available on the CRAN repository and it can not be installed using a standard package installation. The package can be downloaded from the course web site (see the SUPPLEMENTARY R PACKAGE section) and it can be installed by running the command

from R working environment. The Windows binary file is intended for the MS Windows users (as the title suggests), the source code is intended for those users who are used to compile their software from the source (mostly Linux, Mac etc. users).`install.packages("C:/WHERE_DOWNLOADED/mffSM_1.1.zip", repos = NULL)`

- The
**mffSM**package depends on packages*colorspace*,*lattice*, and*car*, which are already all available in the standard way from CRAN. All these dependency packages should be normally automatically installed if the installation of the**mffSM**package is performed directly from the R console on an internet-connected computer using the command above. - All computers avialable in the lecture rooms K4 and K11 are equiped with the R software and the
**mffSM**packages should be properly installed on all of them. - The
**mffSM**package contains most of the datasets we will be working on during the exercise classes. In addition, there are also some minor R functions which will be useful during the term.

**Credit Requirements**

The credit requirements for the NMSA407 exercises consist of two main parts.

**Homework assignments**

All three homework assignments must be accepted.**Final test**

The percentual gain in the final test must be 60 % oat least.

**Syllabus & Script Files**

The syllabus will be updated as the semester progresses. The R script files provided below will be discussed during the sessions and they will evaluated with the R software and explained correspondingly (with a focuss on the statistical theory behind, not the implementation of the R commands themselves). Thus, all students are expected to be familiar with R and to be able to handle R programming by themselves.

**Lab session no.1**|*Tuesday (01/10) and Thursday (03/10)*

Working R script: download**Lab session no.2**|*Tuesday (08/10) and Thursday (10/10)*

Working R script: download | Peat data file**Lab session no.3**|*Tuesday (15/10) and Thursday (17/10)*

Working R script: download**Lab session no.4**|*Tuesday (22/10) and Thursday (24/10)*

Working R script: download**Lab session no.5**|*Tuesday (29/10) and Thursday (31/10)*

Working R script: download**Lab session no.6**|*Tuesday (05/11) and Thursday (07/11)*

Working R script: download**Lab session no.7**|*Tuesday (19/11) and Thursday (14/11)*

Working R script: download | Chicago data file**Lab session no.8**|*Tuesday (19/11) and Thursday (21/11)*

Working R script: download**Lab session no.9**|*Tuesday (26/11) and Thursday (28/11)*

Working R script: download**Lab session no.10**|*Tuesday (03/12) and Thursday (05/12)*

Working R script: download | Boston data file**Lab session no.11**|*Tuesday (09/12) and Thursday (11/12)*

Working R script: download | mana data file**Lab session no.12**|*Tuesday (07/01) and Thursday (09/11)*

Working R script: download | soil data file | pribram data file

**Supplementary Material**

Additional material (final test examples, a brief theory on the maximum likelihood estimation, etc.) can be found here. The supplementary matrial will be updated during the semester.

**Maximum likelihood theory**

Brief theoretical summary and illustrative examples: PDF file**Final test: sample tasks**

A set of illustrative examples which could appear in the final test: PDF file

An example of the final test (version from 2017): PDF file

**Homework Assignments**

All together, there will be three homework assignments. Each homework assignment can be worked out in a group of 1--3 students and different groups can be formed for different homework assignments. Groups of three students are preferable. For more details about the homework assignments see the NMSA 407 Outline document.

**Homework assignment no.1**

All necessary instructions on what to do and where/when to submit you solutions are given in the description file. The data file can be either downloaded (see below) or it can be directly loaded into the R working environment by using the command specified in the description file.

→**Assignment & Instructions:**PDF file

→**Working Dataset:**TXT file

→**Deadline:**Lab session no.3 (15/10 and 17/10)

**Homework assignment no.2**

All necessary instructions on what to do and where/when to submit you solutions are given in the description file. The data file can be either downloaded (see below) or it can be directly loaded into the R working environment by using the command specified in the description file.

→**Assignment & Instructions:**PDF file

→**Working Dataset:**RData file

→**Deadline:**Lab session no.8 (19/11 and 21/11)

**Homework assignment no.3**

All necessary instructions on what to do and where/when to submit you solutions are given in the description file. The data file can be either downloaded (see below) or it can be directly loaded into the R working environment by using the command specified in the description file.

→**Assignment & Instructions:**PDF file

→**Working Dataset:**RData file

→**Deadline:**30.December 2019 [23:59] (PDF file submitted by email)

**Final test results**

- The minimal gain is 55 points at least.

- The minimal gain is 55 points at least.

- The (only) retake of the final test takes place on Monday, January 13, 2020 in K4, starting at 14:00. There will be no other retakes avalaible in this term.

**Disclaimer**

Vrámci platných Pravidiel pro organizaci studia na Matematicko-fyzikální fakultě Univerzity Karlovy (ze dne 14.června, 2017), sa vzhľadom k Čl. 8, dds.2 týchto pravidiel týmto vyhlasuje, že povaha předmětu vylučuje právo studenta na jeden řádny a dva opravné termíny pro získaní zápočtu. Získaní zápočtu sa riadi výhradne pravidlami uvedenými vyššie a detailne popisanými v tomto NMSA 407 outline documente.