Introduction

This workshop introduces mixed-effects regression modeling using R. The RMarkdown document for the tutorial can be downloaded here and the bib library here. You will find more elaborate explanations and additional examples here.

The workshop consists of 2 parts:

  1. Theoretical background and basics: this part deals with main concepts and the underlying logic of linear and logistic (mixed-effects) regression models

  2. Practical examples and potential issues: this part focuses on the practical implementation of linear and logistic mixed-models.

Preparation and session set up

# set options
options(stringsAsFactors = F)         # no automatic data transformation
options("scipen" = 100, "digits" = 10) # suppress math annotation
# install packages
install.packages(c("boot", "car", "caret", "tidyverse",  "effects", "foreign", 
                   "Hmisc", "DT", "knitr", "lme4", "MASS", "mlogit", "msm", 
                   "QuantPsyc", "reshape2", "rms", "sandwich", "sfsmisc", "sjPlot", 
                   "vcd", "visreg", "MuMIn", "lmerTest"))

Once you have installed R and RStudio and initiated the session by executing the code shown above, you are good to go.

Theoretical Background

Regression models are among the most widely used quantitative methods in the language sciences. Regressions are used because they are very flexible and can handle multiple predictors and responses. In general, regression models provide information about if and how predictors (variables or interactions between variables) correlate with a certain response.

The most widely use regression models are

If regression models contain a random effect structure which is used to model nestedness or dependence among data points, the regression models are called mixed-effect models. Regressions that do not have a random effect component to model nestedness or dependence are referred to as fixed-effect regressions (we will have a closer look at the difference between fixed and random effects below).

There exists a wealth of literature focusing on regression analysis and the concepts it is based on. For instance, there are Achen (1982), Bortz (2006), Crawley (2005), Faraway (2002), Field, Miles, and Field (2012) (my personal favorite), Gries (2021), Levshina (2015), and Wilcox (2009) to name just a few. Introductions to regression modeling in R are Baayen (2008), Crawley (2012), Gries (2021), or Levshina (2015).

The idea behind regression analysis is expressed formally in the equation below where\(f_{(x)}\) is the \(y\)-value we want to predict, \(\alpha\) is the intercept (the point where the regression line crosses the \(y\)-axis), \(\beta\) is the coefficient (the slope of the regression line).

\(f_{(x)} = \alpha + \beta_{i}x + \epsilon\)

In other words, to estimate how much some weights who is 180cm tall, we would multiply the coefficent (slope of the line) with 180 (\(x\)) and add the value of the intercept (point where line crosses the \(y\)-axis). If we plug in the numbers from the regression model below, we get

-93.77 + 0.98 ∗ 180 = 83.33 (kg)

Residuals are the distance between the line and the points (the red lines) and it is also called variance. Regression lines are those lines where the sum of the red lines should be minimal. The slope of the regression line is called coefficient and the point where the regression line crosses the y-axis is called the intercept.

The basic principle