Introduction

This workshop introduces data processing and analysis for eye-tracking data in R. The RMarkdown document for the tutorial can be downloaded here and the bib library here. You can also download a shortened version of the RMarkdown document with only contains the processing chain here (here is the link to the html file of the document). You will find very helpful and detailed tutorials on how to perform analyses and visualize eye-tracking data using eyetrackeR here.

We will go through the following steps:

preparing the analysis
loading and combining xls spreadsheets
determining image boundaries
loading the master file
processing the data
- defining AOIs
- on- as well as off-target gazes
- removing imprecisions
- time binning
- cleaning
visualizing the data
analyzing the data (e.g., using a mixed-effects binomial logistic regression)
using the eyetrackeR package for data visualization and analysis

We will not address issues relating to adequate sample size and power. If you are interested in that, please check out this tutorial on the Language Technology and Data Analysis Laboratory website.

Preparation

This tutorial is based on R. If you have not installed R or RStudio or if you are new to either of them, you will find an introduction to and more information how to use R and RStudio here. For this tutorials, we need to install certain packages into an R library on our computer so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

install.packages(c("tidyverse", "eyetrackingR", "data.table", "itsadug", "sjPlot", "lme4", "multcomp"))

Once you have installed R and RStudio and initiated the session by executing the code shown above, you are good to go.

Once you have installed the packages, please load them and set useful options as shown below.

# set options
options(stringsAsFactors = F)          # no automatic data transformation
options("scipen" = 100, "digits" = 10) # suppress math annotation
# load packages
library(tidyverse)
library(eyetrackingR)
library(data.table)
library(itsadug)
library(lme4)
library(sjPlot)
library(multcomp)

Now that we have prepared out session, we can start with the data processing.

Data processing

During data processing, we load and prepare the data for further analysis and visualization.

Define paths

In a first step, we define the paths to the spreadsheets (datapath) and to the masterfile (csv file with information about the experiment). In my case, I have the spreadsheets in a folder called uploads which is a folder called data_exp_50674-v2 in my data folder. The masterfile is also in the folder called data_exp_50674-v2 in my data folder but it is not in my uploads folder.

datapath <- here::here("data/data_exp_50674-v2", "uploads")
masterpath <- here::here("data/data_exp_50674-v2", "data_exp_50674-v2_task-etfm.csv")

Now that we have defined the paths, we continue.

Load data

In a first step we want to load the data which in our case consists of several spreadsheets (files ending in xlsx).

We begin by extracting a list of these xlsx files (the paths where these files are located on your computer).

fls <- list.files(datapath, full.names = T)
fls <- fls[2:length(fls)]
# inspect files
head(fls)

## [1] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx" 
## [2] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-10-2.xlsx"
## [3] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-11-2.xlsx"
## [4] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-12-2.xlsx"
## [5] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-13-2.xlsx"
## [6] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-14-2.xlsx"

Next, we can use this list (the paths) to load the files. In addition, we will also create two new columns: a column called idname which contains the path to the file and a column called trial which tells us what trial the data is from.

datls <- lapply(fls, function(x){
  name <- x
  x <- readxl::read_xlsx(x) %>%
    # create id column (contains path)
    dplyr::mutate(idname = name) %>%
    # code trial
    dplyr::mutate(trial = stringr::str_remove_all(name, ".*collection-"))  %>%
    dplyr::mutate(trial = stringr::str_remove_all(trial, "-.*"))
  })
# inspect data
head(datls[1])

## [[1]]
## # A tibble: 295 × 25
##    `0`   filename   participant_id spreadsheet_row time_stamp time_elapsed type 
##    <lgl> <chr>               <dbl>           <dbl>      <dbl>        <dbl> <chr>
##  1 NA    eyetracki…        3699461              16    1.62e12          0   new …
##  2 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  3 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  4 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  5 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  6 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  7 NA    eyetracki…        3699461              16    1.62e12          0   zone 
##  8 NA    eyetracki…        3699461              16    1.62e12          0   pred…
##  9 NA    eyetracki…        3699461              16    1.62e12         17.3 pred…
## 10 NA    eyetracki…        3699461              16    1.62e12         33.3 pred…
## # ℹ 285 more rows
## # ℹ 18 more variables: screen_index <dbl>, x_pred <dbl>, y_pred <dbl>,
## #   x_pred_normalised <dbl>, y_pred_normalised <dbl>, convergence <dbl>,
## #   face_conf <dbl>, zone_name <chr>, zone_x <dbl>, zone_y <dbl>,
## #   zone_width <dbl>, zone_height <dbl>, zone_x_normalised <dbl>,
## #   zone_y_normalised <dbl>, zone_width_normalised <dbl>,
## #   zone_height_normalised <dbl>, idname <chr>, trial <chr>

We can now merge all the spreadsheets into one file and also add a column called id that gives each row a unique identifier. Furthermore, we convert the participant_id and the trial column into factors.

edat <- data.table::rbindlist(datls) %>%
  # add id
  dplyr::mutate(id = 1:nrow(.)) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial))

First 6 rows of edat.
0	filename	participant_id	spreadsheet_row	time_stamp	type	screen_index	zone_name	zone_x	zone_y	zone_width	zone_height	zone_x_normalised	zone_y_normalised	zone_width_normalised	zone_height_normalised	idname	trial	id
NA	eyetracking_collection	3699461	16	1619598396444	new collection screen	2	NA	0.00000	0	0	0	0.0000000000	0.000000000	0.0000000000	0.0000000000	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	1
NA	eyetracking_collection	3699461	16	1619598396444	zone	2	screen	0.00000	0	1440	821	-0.1576769406	0.000000000	1.3150684932	1.0000000000	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	2
NA	eyetracking_collection	3699461	16	1619598396445	zone	2	gorilla	172.65625	0	1095	821	0.0000000000	0.000000000	1.0000000000	1.0000000000	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	3
NA	eyetracking_collection	3699461	16	1619598396445	zone	2	Zone1	172.65625	772	55	49	0.0000000000	0.940316687	0.0502283105	0.0596833130	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	4
NA	eyetracking_collection	3699461	16	1619598396445	zone	2	Right	774.65625	246	438	320	0.5497716895	0.299634592	0.4000000000	0.3897685749	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	5
NA	eyetracking_collection	3699461	16	1619598396445	zone	2	Left	227.65625	246	438	329	0.0502283105	0.299634592	0.4000000000	0.4007308161	F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx	1	6

Determine image boundaries

Non-normalized

In a next step, we define image boundaries. In our example, we are dealing with two images: one to the right and one to the left. We now use the zone_y, zone_x, zone_height, and zone_width columns to calculate the edges of the images.

top (upper border) = 246 (zone_y)
bottom (lower border) = 246 + 329 (zone_y + zone_height)
left (left border) = 774 (zone_x)
right (right border) = 774 + 438 (zone_x + zone_width)

ibs <- edat %>%
  dplyr::select(participant_id, trial, zone_name, zone_x, zone_y,zone_width, zone_height) %>%
  # get rid of superfluous rows
  dplyr::filter(zone_name == "Right"|zone_name == "Left") %>%
  na.omit() %>%
  # define image boundaries
  dplyr::mutate(top = zone_y,
                bottom = zone_y + zone_width,
                left = zone_x,
                right = zone_x + zone_width) %>%
  # remove superfluous columns
  dplyr::select(-zone_x, -zone_y, -zone_width, -zone_height)

First 10 rows of ibs.
participant_id	trial	zone_name	top	bottom	left	right
3699461	1	Right	246	684	774.656250	1212.656250
3699461	1	Left	246	684	227.656250	665.656250
3699461	10	Right	246	684	774.656250	1212.656250
3699461	10	Left	246	684	227.656250	665.656250
3699461	11	Right	246	684	774.671875	1212.671875
3699461	11	Left	246	684	227.671875	665.671875
3699461	12	Right	246	684	774.671875	1212.671875
3699461	12	Left	246	684	227.671875	665.671875
3699461	13	Right	246	684	774.656250	1212.656250
3699461	13	Left	246	684	227.656250	665.656250

Normalized

We can also use the normalized values.

If you do this, then it is very crucial, that you use x_pred_normalised and y_pred_normalised and not x_pred and y_pred in your analysis!

In a next step, we define image boundaries. In our example, we are dealing with two images: one to the right and one to the left. We now use the zone_y_normalised, zone_x_normalised, zone_height_normalised, and zone_width_normalised columns to calculate the edges of the images.

top (upper border) = 246 (zone_y_normalised)
bottom (lower border) = 246 + 329 (zone_y_normalised + zone_height_normalised)
left (left border) = 774 (zone_x_normalised)
right (right border) = 774 + 438 (zone_x_normalised + zone_width_normalised)

ibs_norm <- edat %>%
  dplyr::select(participant_id, trial, zone_name, zone_x_normalised, 
                zone_y_normalised, zone_width_normalised, zone_height_normalised) %>%
  # get rid of superfluous rows
  dplyr::filter(zone_name == "Right"|zone_name == "Left") %>%
  na.omit() %>%
  # define image boundaries
  dplyr::mutate(top = zone_y_normalised,
                bottom = zone_y_normalised + zone_width_normalised,
                left = zone_x_normalised,
                right = zone_x_normalised + zone_width_normalised) %>%
  # remove superfluous columns
  dplyr::select(-zone_x_normalised, -zone_y_normalised, -zone_width_normalised, -zone_height_normalised)

First 10 rows of ibs.
participant_id	trial	zone_name	top	bottom	left	right
3699461	1	Right	0.299634592	0.699634592	0.5497716895	0.9497716895
3699461	1	Left	0.299634592	0.699634592	0.0502283105	0.4502283105
3699461	10	Right	0.299634592	0.699634592	0.5497716895	0.9497716895
3699461	10	Left	0.299634592	0.699634592	0.0502283105	0.4502283105
3699461	11	Right	0.299634592	0.699634592	0.5497716895	0.9497716895
3699461	11	Left	0.299634592	0.699634592	0.0502283105	0.4502283105
3699461	12	Right	0.299634592	0.699634592	0.5497716895	0.9497716895
3699461	12	Left	0.299634592	0.699634592	0.0502283105	0.4502283105
3699461	13	Right	0.299634592	0.699634592	0.5497716895	0.9497716895
3699461	13	Left	0.299634592	0.699634592	0.0502283105	0.4502283105

Adding edges

We now transform the data so that we have the information about the edges in separate columns. Thus, we have four columns for the right and the left image: bottomedge, leftedge, rightedge, and topedge.

ibs <- ibs %>%
  dplyr::mutate(position = tolower(zone_name)) %>%
  tidyr::gather(edge, coordinate, top:right) %>%
  dplyr::mutate(position_edge = paste0(position, "_", edge, "edge")) %>%
  dplyr::select(-zone_name, -position, -edge) %>%
  tidyr::spread(position_edge, coordinate) %>%
  dplyr::group_by(participant_id, trial) %>%
  dplyr::summarise(left_bottomedge = left_bottomedge,
                   left_leftedge = left_leftedge,
                   left_rightedge = left_rightedge,
                   left_topedge = left_topedge,
                   right_bottomedge = right_bottomedge,
                   right_leftedge = right_leftedge,
                   right_rightedge = right_rightedge,
                   right_topedge = right_topedge)

First 10 rows of ibs.
participant_id	trial	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge
3699461	1	684	227.656250	665.656250	246	684	774.656250	1212.656250	246
3699461	10	684	227.656250	665.656250	246	684	774.656250	1212.656250	246
3699461	11	684	227.671875	665.671875	246	684	774.671875	1212.671875	246
3699461	12	684	227.671875	665.671875	246	684	774.671875	1212.671875	246
3699461	13	684	227.656250	665.656250	246	684	774.656250	1212.656250	246
3699461	14	684	227.671875	665.671875	246	684	774.671875	1212.671875	246
3699461	15	684	227.671875	665.671875	246	684	774.671875	1212.671875	246
3699461	16	684	227.656250	665.656250	246	684	774.656250	1212.656250	246
3699461	17	684	227.671875	665.671875	246	684	774.671875	1212.671875	246
3699461	18	684	227.671875	665.671875	246	684	774.671875	1212.671875	246

Process master file

Now that we have processed the data and defined the image boundaries, we load the master file. The master file contains information about the experiment, the individual trials, and the computer and browser used by the participant. We now load master file (in our example this is called data_exp_50674-v2_task-etfm.csv) from the folder called data_exp_50674-v2 which is located in the data folder.

mstr <- read_csv(masterpath) %>%
  # create participant column that matches the participant column in the data 
  dplyr::mutate(participant_id = `Participant Private ID`)

First 10 rows of mstr.
Event Index	UTC Timestamp	UTC Date	Local Timestamp	Local Timezone	Local Date	Experiment ID	Experiment Version	Tree Node Key	Repeat Key	Schedule ID	Participant Public ID	Participant Private ID	Participant Starting Group	Participant Status	Participant Completion Code	Participant External Session ID	Participant Device Type	Participant Device	Participant OS	Participant Browser	Participant Monitor Size	Participant Viewport Size	Checkpoint	Task Name	Task Version	Spreadsheet	Spreadsheet Name	Spreadsheet Row	Trial Number	Screen Number	Screen Name	Zone Name	Zone Type	Reaction Time	Reaction Onset	Response Type	Response	Attempt	Incorrect	X Coordinate	Y Coordinate	Timed Out	randomise_blocks	randomise_trials	display	ANSWER	audio_file	picture_left	picture_right	trial number	condition	target_gender	target_item	target_position	competitor	competitor_item	competitor_position	color	participant_id
1	1619598327535	28/04/2021 08:25:27	1619598327448	2	28/04/2021 10:25:27	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	NA	NA	BEGIN TASK	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3699461
2	1619598392107	28/04/2021 08:26:32	1619598392021	2	28/04/2021 10:26:32	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	1	1	1	Screen 1	Zone1	eye_tracking	NA	NA	calibration succeeded	0	NA	1	NA	NA	NA	NA	NA	Calibration	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3699461
3	1619598396097	28/04/2021 08:26:36	1619598396012	2	28/04/2021 10:26:36	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	1	1	1	Screen 1	Zone1	eye_tracking	68528.495	NA	NA	NA	NA	1	NA	NA	NA	NA	NA	Calibration	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3699461
4	1619598396489	28/04/2021 08:26:36	1619598396412	2	28/04/2021 10:26:36	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	1	Screen 2	Zone1	fixation	449.982	NA	NA	NA	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
5	1619598396597	28/04/2021 08:26:36	1619598396446	2	28/04/2021 10:26:36	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone2	content_web_audio	22.245	NA	NA	AUDIO PLAY REQUESTED	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
6	1619598401674	28/04/2021 08:26:41	1619598401595	2	28/04/2021 10:26:41	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone1	eye_tracking	NA	NA	Left Time	2209.4900000083726	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
7	1619598401780	28/04/2021 08:26:41	1619598401598	2	28/04/2021 10:26:41	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone1	eye_tracking	NA	NA	Left Percent	43	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
8	1619598401780	28/04/2021 08:26:41	1619598401598	2	28/04/2021 10:26:41	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone1	eye_tracking	NA	NA	Right Time	2901.96499999729	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
9	1619598401780	28/04/2021 08:26:41	1619598401599	2	28/04/2021 10:26:41	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone1	eye_tracking	NA	NA	Right Percent	57	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461
10	1619598401780	28/04/2021 08:26:41	1619598401599	2	28/04/2021 10:26:41	50674	2	task-etfm	NA	12247303	BLIND	3699461	NA	complete	NA	NA	computer	Desktop or Laptop	Mac OS 10.14.6	Chrome 89.0.4389.128	1440x900	1440x821	NA	Eyetracking_Sample	3	Spreadsheet1	Spreadsheet1	16	1	2	Screen 1	Zone1	eye_tracking	NA	NA	A Time	1811.7099999799393	NA	1	NA	NA	NA	NA	1	Eyetracking	train.png	NA	train.png	cupboard.png	15	color	m	train	left	m	cupboard	right	black & brown	3699461

The master file contains a lot of information. As retaining columns with unnecessary information renders it difficult to parse and work with data, we remove columns that we do not need. We would like to retain the following columns though: participant_id, trial_number, condition, target_gender, target_position, Zone Type, Response, and Correct.

mstr_redux <- mstr %>%
  # select columns you need
  dplyr::select(participant_id, `trial number`, condition, target_gender, 
                target_position, `Zone Type`, Response, Correct, target_item) %>%
  # filter unique 
  unique() %>%
  # remove rows containing NA
  na.omit() %>%
  # filter out superfluous rows
  dplyr::filter(`Zone Type` == "response_button_image") %>%
  dplyr::rename(trial = `trial number`) %>%
  dplyr::mutate(trial = as.character(trial)) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial))

First 10 rows of mstr_redux.
participant_id	trial	condition	target_gender	target_position	Zone Type	Response	Correct	target_item
3699461	15	color	m	left	response_button_image	cupboard.png	0	train
3699461	2	same	f	right	response_button_image	carrot.png	0	rose
3699461	6	same	n	right	response_button_image	egg.png	0	piano
3699461	13	color	f	left	response_button_image	computer.png	0	hat
3699461	14	color	f	right	response_button_image	chair.png	0	church
3699461	8	differernt	f	right	response_button_image	banana.png	1	banana
3699461	5	same	n	left	response_button_image	bed.png	1	bed
3699461	12	different	n	right	response_button_image	door.png	0	bicycle
3699461	9	differernt	m	left	response_button_image	cheese.png	1	cheese
3699461	1	same	f	left	response_button_image	cup.png	0	fork

Join data and masterfile

The next step consists in joining (or merging) the data (edat) with the information about the image boundaries (ibs). Before joining these two data sets, we will clean the dat in the edat file by

removing columns we do not need
factorizing participants and trials
removing rows without gaze information

edat <- edat %>%
  # remove superfluous columns
  dplyr::select(-`0`, -filename, -spreadsheet_row, -time_stamp, -screen_index, 
                -convergence, -zone_x, -zone_y, -zone_width,
                -zone_height, -zone_x_normalised, -zone_y_normalised, -zone_width_normalised,
                -zone_height_normalised, -idname, 
                
                # WARNING: If you work with normalized values, REPLACE the following 
                # with their non-normalized counterparts!
                
                -x_pred_normalised, -y_pred_normalised) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial)) %>%
  # remove rows without gaze information
  dplyr::filter(x_pred != 0,
                y_pred != 0)

First 10 rows of edat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12
3699461	83.01000000	prediction	797	676	0.8164998461	NA	1	13
3699461	99.08499999	prediction	740	631	0.8164998461	NA	1	14
3699461	114.96000001	prediction	846	603	0.8164998461	NA	1	15
3699461	131.33000000	prediction	747	642	0.8164998461	NA	1	16
3699461	148.24499999	prediction	596	426	0.8162385964	NA	1	17

Now, we can join (or merge) edat with ibs (image boundaries).

edatibs <- left_join(edat, ibs, by = c("participant_id", "trial"))

First 10 rows of edatibs.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	83.01000000	prediction	797	676	0.8164998461	NA	1	13	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	99.08499999	prediction	740	631	0.8164998461	NA	1	14	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	114.96000001	prediction	846	603	0.8164998461	NA	1	15	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	131.33000000	prediction	747	642	0.8164998461	NA	1	16	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	148.24499999	prediction	596	426	0.8162385964	NA	1	17	684	227.65625	665.65625	246	684	774.65625	1212.65625	246

Remove imprecision

It is also useful, to remove data points that have low precision. We thus remove data points with a predicted accuracy lower than .5 (face_conf should greater than .5).

edatibs <- edatibs %>%
  # filter imprecise data points
  dplyr::filter(face_conf >= .5)
# inspect
head(edatibs)

##    participant_id time_elapsed       type x_pred y_pred    face_conf zone_name
## 1:        3699461   0.00000000 prediction    618    533 0.8164998461      <NA>
## 2:        3699461  17.32499999 prediction    604    416 0.8164998461      <NA>
## 3:        3699461  33.34500000 prediction    558    477 0.8164998461      <NA>
## 4:        3699461  48.51500000 prediction    791    753 0.8164998461      <NA>
## 5:        3699461  65.08999999 prediction    783    680 0.8164998461      <NA>
## 6:        3699461  83.01000000 prediction    797    676 0.8164998461      <NA>
##    trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1:     1  8             684     227.65625      665.65625          246
## 2:     1  9             684     227.65625      665.65625          246
## 3:     1 10             684     227.65625      665.65625          246
## 4:     1 11             684     227.65625      665.65625          246
## 5:     1 12             684     227.65625      665.65625          246
## 6:     1 13             684     227.65625      665.65625          246
##    right_bottomedge right_leftedge right_rightedge right_topedge
## 1:              684      774.65625      1212.65625           246
## 2:              684      774.65625      1212.65625           246
## 3:              684      774.65625      1212.65625           246
## 4:              684      774.65625      1212.65625           246
## 5:              684      774.65625      1212.65625           246
## 6:              684      774.65625      1212.65625           246

First 5 rows of edatibs.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246

Combining the data and metadata

Now we combine the collected data (edatibs = edat plus image boundaries ibs) with the metadata (the information from the reduced master file mstr_redux)

dat <- dplyr::left_join(edatibs, mstr_redux, by = c("participant_id", "trial"))
# inspect
head(dat)

##    participant_id time_elapsed       type x_pred y_pred    face_conf zone_name
## 1:        3699461   0.00000000 prediction    618    533 0.8164998461      <NA>
## 2:        3699461  17.32499999 prediction    604    416 0.8164998461      <NA>
## 3:        3699461  33.34500000 prediction    558    477 0.8164998461      <NA>
## 4:        3699461  48.51500000 prediction    791    753 0.8164998461      <NA>
## 5:        3699461  65.08999999 prediction    783    680 0.8164998461      <NA>
## 6:        3699461  83.01000000 prediction    797    676 0.8164998461      <NA>
##    trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1:     1  8             684     227.65625      665.65625          246
## 2:     1  9             684     227.65625      665.65625          246
## 3:     1 10             684     227.65625      665.65625          246
## 4:     1 11             684     227.65625      665.65625          246
## 5:     1 12             684     227.65625      665.65625          246
## 6:     1 13             684     227.65625      665.65625          246
##    right_bottomedge right_leftedge right_rightedge right_topedge condition
## 1:              684      774.65625      1212.65625           246      same
## 2:              684      774.65625      1212.65625           246      same
## 3:              684      774.65625      1212.65625           246      same
## 4:              684      774.65625      1212.65625           246      same
## 5:              684      774.65625      1212.65625           246      same
## 6:              684      774.65625      1212.65625           246      same
##    target_gender target_position             Zone Type Response Correct
## 1:             f            left response_button_image  cup.png       0
## 2:             f            left response_button_image  cup.png       0
## 3:             f            left response_button_image  cup.png       0
## 4:             f            left response_button_image  cup.png       0
## 5:             f            left response_button_image  cup.png       0
## 6:             f            left response_button_image  cup.png       0
##    target_item
## 1:        fork
## 2:        fork
## 3:        fork
## 4:        fork
## 5:        fork
## 6:        fork

First 10 rows of dat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge	condition	target_gender	target_position	Zone Type	Response	target_item
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	83.01000000	prediction	797	676	0.8164998461	NA	1	13	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	99.08499999	prediction	740	631	0.8164998461	NA	1	14	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	114.96000001	prediction	846	603	0.8164998461	NA	1	15	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	131.33000000	prediction	747	642	0.8164998461	NA	1	16	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork
3699461	148.24499999	prediction	596	426	0.8162385964	NA	1	17	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork

AOI

We now use the edges of the images to determine if gazes were in the AOI

dat <- dat %>%
  # determine if participant's gaze was in AOI
  dplyr::mutate(AOI = ifelse(
    # if target is left image
    target_position == "left" &
      y_pred > left_topedge & 
      y_pred < left_bottomedge & 
      x_pred > left_leftedge & 
      x_pred < left_rightedge, 1,
    # if target is right image
    ifelse(target_position == "right" &
             y_pred > right_topedge   & 
             y_pred < right_bottomedge &
             x_pred  > right_leftedge & 
             x_pred <  right_rightedge, 1,
           0)))

First 5 rows of dat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge	condition	target_gender	target_position	Zone Type	Response	target_item	AOI
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	0
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	0

Binning

Define time bins (here 100 ms)

dat <- dat %>%
  # arrange by participant, trial, and time
  dplyr::arrange(participant_id, trial, time_elapsed) %>%
  # bin times into .2 time bins
  dplyr::mutate(TimeBin = itsadug::timeBins(time_elapsed, 100, pos=0))

First 5 rows of dat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge	condition	target_gender	target_position	Zone Type	Response	target_item	AOI
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	1
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	0
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	fork	0

Cleaning

dat <- dat %>%
  # clean condition (spelling error!)
  dplyr::mutate(condition = dplyr::case_when(condition == "color" ~ "color",
                                             condition == "same" ~ "same",
                                             condition == "different" ~ "different",
                                             condition == "differernt" ~ "different",
                                             TRUE ~ condition)) %>%
  # change correct from 0 vs 1 into correct vs incorrect
  dplyr::mutate(Correct = ifelse(Correct == 1, "Correct",
                                 ifelse(Correct == 0, "Incorrect", Correct)),
                Correct = factor(Correct))

First 5 rows of dat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge	condition	target_gender	target_position	Zone Type	Response	Correct	target_item	AOI
3699461	0.00000000	prediction	618	533	0.8164998461	NA	1	8	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	Incorrect	fork	1
3699461	17.32499999	prediction	604	416	0.8164998461	NA	1	9	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	Incorrect	fork	1
3699461	33.34500000	prediction	558	477	0.8164998461	NA	1	10	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	Incorrect	fork	1
3699461	48.51500000	prediction	791	753	0.8164998461	NA	1	11	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	Incorrect	fork	0
3699461	65.08999999	prediction	783	680	0.8164998461	NA	1	12	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	same	f	left	response_button_image	cup.png	Incorrect	fork	0

Getting rid of incorrect observations

dat <- dat %>%
  dplyr::filter(Correct != "Incorrect")

First 5 rows of dat.
participant_id	time_elapsed	type	x_pred	y_pred	face_conf	zone_name	trial	id	left_bottomedge	left_leftedge	left_rightedge	left_topedge	right_bottomedge	right_leftedge	right_rightedge	right_topedge	condition	target_gender	target_position	Zone Type	Response	Correct	target_item	AOI
3699461	0.00000000	prediction	753	467	0.8142243928	NA	10	303	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	different	m	right	response_button_image	tree.png	Correct	tree	0
3699461	17.86500000	prediction	771	498	0.8142243928	NA	10	304	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	different	m	right	response_button_image	tree.png	Correct	tree	0
3699461	33.91999999	prediction	843	454	0.8142243928	NA	10	305	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	different	m	right	response_button_image	tree.png	Correct	tree	1
3699461	49.38000001	prediction	654	536	0.8142243928	NA	10	306	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	different	m	right	response_button_image	tree.png	Correct	tree	0
3699461	65.33499999	prediction	619	462	0.8142243928	NA	10	307	684	227.65625	665.65625	246	684	774.65625	1212.65625	246	different	m	right	response_button_image	tree.png	Correct	tree	0

#’ Saving the data{-}

You can now save the data in your data folder, if you like.

write.table(dat, here::here("data", "dat.txt"), sep = "\t", row.names = F)

To re-load this data, you would have use the following command:

reload <- read.delim(here::here("data", "dat.txt"), sep = "\t")
# inspect
reload[1:4, 1:4]

##   participant_id time_elapsed       type x_pred
## 1        3699461   0.00000000 prediction    753
## 2        3699461  17.86500000 prediction    771
## 3        3699461  33.91999999 prediction    843
## 4        3699461  49.38000001 prediction    654

Data Viz

Prepare data for a visualization

f1 <- dat %>%
  # remove "weird" data points
  dplyr::filter(x_pred > 0,
                y_pred > 0,
                time_elapsed < 4200) %>%
  # grouping
  dplyr::group_by(condition, TimeBin, Correct, target_gender) %>%
  # summarise: calculate proportion of looks in AOI
  dplyr::summarise(Proportion = mean(AOI))
# inspect data
head(f1, 10)

## # A tibble: 10 × 5
## # Groups:   condition, TimeBin, Correct [4]
##    condition TimeBin Correct target_gender Proportion
##    <chr>       <dbl> <fct>   <chr>              <dbl>
##  1 color           0 Correct f                  0.278
##  2 color           0 Correct m                  0.429
##  3 color           0 Correct n                  0.545
##  4 color         100 Correct f                  0.353
##  5 color         100 Correct m                  0.407
##  6 color         100 Correct n                  1    
##  7 color         200 Correct f                  0.353
##  8 color         200 Correct m                  0.379
##  9 color         200 Correct n                  1    
## 10 color         300 Correct f                  0.312

First 5 rows of f1.
condition	TimeBin	Correct	target_gender	Proportion
color	0	Correct	f	0.2777777778
color	0	Correct	m	0.4285714286
color	0	Correct	n	0.5454545455
color	100	Correct	f	0.3529411765
color	100	Correct	m	0.4074074074

Line plot

ggplot(f1, aes(y = Proportion, x = TimeBin, color = condition)) +
  # lines for proportions
  geom_line() +
  # add vertical line
  geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
  # add vertical line
  geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
  # add text
  ggplot2::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +
  # separate panels for each target_gender
  facet_grid(target_gender ~ .) +
  # black and white theme
  theme_bw() + 
  # no grid lines
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.position = "top",
        # define x-axis tick labels
        axis.text.x = element_text(angle = 45, vjust=0.6, size = 10)) +
  # define x-axis
  scale_x_continuous(name = "Time in Trial (ms)", 
                     limits = c(0,4000), 
                     breaks = seq(0,4000,1000), 
                     labels = seq(0, 4000, 1000)) +
  # define y-axis
  scale_y_continuous(name = "Proportion in AOI", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1,.2), 
                     labels = seq(0, 1, .2))

# save plot
ggsave(file = here::here("images","Fig01.png"), 
       height = 5,  width = 10,  dpi = 320)

Lineplot with errorbars

# scatter plot with error bars
ggplot(dat, aes(x=TimeBin, y= AOI,  group = condition, color = condition)) +                 
  stat_summary(fun = mean, geom = "line", aes(group= condition, color = condition)) +          
  stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
  # add vertical line
  geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
  # add vertical line
  geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
  # add text
  ggplot2::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +            
  # def. font size
  theme_bw(base_size = 15) +  
  theme(axis.text.x = element_text(size=10, angle = 90),  
        axis.text.y = element_text(size=10, face="plain"),
        legend.position = "top",
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) +  
  # define x-axis
  scale_x_continuous(name = "Time in Trial (ms)", 
                     limits = c(0,4000), 
                     breaks = seq(0,4000,1000), 
                     labels = seq(0, 4000, 1000)) +
  # define y-axis
  scale_y_continuous(name = "Proportion in AOI", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1,.2), 
                     labels = seq(0, 1, .2))

## Warning: Removed 123 rows containing non-finite values (`stat_summary()`).
## Removed 123 rows containing non-finite values (`stat_summary()`).

# save plot
ggsave(file = here::here("images","Fig02.png"), 
       height = 5,  width = 10,  dpi = 320)

## Warning: Removed 123 rows containing non-finite values (`stat_summary()`).
## Removed 123 rows containing non-finite values (`stat_summary()`).

Smoothed line plot

ggplot(f1, aes(y = Proportion, x = TimeBin, color = condition, fill = condition)) +
  # lines for proportions
  geom_smooth(span = .2, alpha = .2) +
  # add vertical line
  geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
  # add vertical line
  geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
  # add text
  ggplot2::annotate(geom = "text", label = "Object", x = 2700, y = 1.1, color = "gray20", size = 5) +
  # separate panels for each condition
  facet_grid(~target_gender) +
  # black and white theme
  theme_bw() + 
  # no grid lines
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.position = "top",
        # define x-axis tick labels
        axis.text.x = element_text(angle = 45, vjust=0.6, size = 10)) +
  # define x-axis
  scale_x_continuous(name = "Time in Trial (ms)", 
                     limits = c(0,4000), 
                     breaks = seq(0,4000,1000), 
                     labels = seq(0, 4000, 1000)) +
  # define y-axis
  scale_y_continuous(name = "Proportion in AOI", 
                     limits = c(-.3, 1.2), 
                     breaks = seq(-.1, 1,.2), 
                     labels = seq(0, 1, .2))

# save plot
ggsave(file = here::here("images","Fig03.png"), 
       height = 5,  width = 10,  dpi = 320)

Statz

We use a mixed-effects binomial logistic regression, to check if the conditions affect the proportion of AOI gazes during a period of interest (after the stimulus was shown).

We go over this without much explanation. However, if you want to know more about how mixed-effects model work, what to consider, and how to interpret them, Gries (2021), Winter (2019), or Field, Miles, and Field (2012) are highly recommendable resources! You can also find additional information here or here.

# set options
options(contrasts  = c("contr.treatment", "contr.poly"))
#options(contrasts  = c("contr.helmert", "contr.poly"))
#options(contrasts  = c("contr.sum", "contr.poly"))

statzdat <- dat %>%
  dplyr::filter(time_elapsed > 1900 &
                  time_elapsed < 3450)

Generate base-line model.

# generate model
m0 <- glmer(AOI ~ (1  | trial) + (1 | target_item), 
                  family = binomial, 
                  data = statzdat, 
                  control=glmerControl(optimizer="bobyqa"))

Generate final model.

# generate model
m1 <- update(m0, .~.+ condition * target_gender)

Summarize the final model.

# generate model
summary(m1)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: AOI ~ (1 | trial) + (1 | target_item) + condition + target_gender +  
##     condition:target_gender
##    Data: statzdat
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   2620.7   2684.1  -1299.3   2598.7     2342 
## 
## Scaled residuals: 
##        Min         1Q     Median         3Q        Max 
## -1.9018982 -0.5924560 -0.2865165  0.7971672  6.6556263 
## 
## Random effects:
##  Groups      Name        Variance  Std.Dev. 
##  trial       (Intercept) 0.5592683 0.7478424
##  target_item (Intercept) 0.4510510 0.6716033
## Number of obs: 2353, groups:  trial, 14; target_item, 14
## 
## Fixed effects:
##                                     Estimate Std. Error  z value   Pr(>|z|)    
## (Intercept)                       -2.7558947  0.7769111 -3.54725 0.00038928 ***
## conditiondifferent                 2.3636688  1.0573990  2.23536 0.02539364 *  
## conditionsame                      2.8272882  1.2887571  2.19381 0.02824907 *  
## target_genderm                     2.4553915  1.0618560  2.31236 0.02075794 *  
## target_gendern                     3.2118163  1.2778093  2.51353 0.01195285 *  
## conditiondifferent:target_genderm -3.0855267  1.4728957 -2.09487 0.03618244 *  
## conditionsame:target_genderm      -2.1498085  1.6440007 -1.30767 0.19098567    
## conditiondifferent:target_gendern -2.4414499  1.7815108 -1.37044 0.17055024    
## conditionsame:target_gendern      -4.3955408  1.9386026 -2.26738 0.02336728 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                        (Intr) cndtnd cndtns trgt_gndrm trgt_gndrn
## cndtndffrnt            -0.734                                    
## conditionsm            -0.603  0.442                             
## targt_gndrm            -0.731  0.537  0.441                      
## targt_gndrn            -0.607  0.445  0.366  0.444               
## cndtndffrnt:trgt_gndrm  0.527 -0.718 -0.318 -0.721     -0.320    
## cndtnsm:trgt_gndrm      0.472 -0.346 -0.784 -0.646     -0.286    
## cndtndffrnt:trgt_gndrn  0.434 -0.592 -0.262 -0.318     -0.716    
## cndtnsm:trgt_gndrn      0.400 -0.293 -0.664 -0.292     -0.659    
##                        cndtndffrnt:trgt_gndrm cndtnsm:trgt_gndrm
## cndtndffrnt                                                     
## conditionsm                                                     
## targt_gndrm                                                     
## targt_gndrn                                                     
## cndtndffrnt:trgt_gndrm                                          
## cndtnsm:trgt_gndrm      0.465                                   
## cndtndffrnt:trgt_gndrn  0.425                  0.205            
## cndtnsm:trgt_gndrn      0.211                  0.521            
##                        cndtndffrnt:trgt_gndrn
## cndtndffrnt                                  
## conditionsm                                  
## targt_gndrm                                  
## targt_gndrn                                  
## cndtndffrnt:trgt_gndrm                       
## cndtnsm:trgt_gndrm                           
## cndtndffrnt:trgt_gndrn                       
## cndtnsm:trgt_gndrn      0.472

Run Post-hoc tests

summary(glht(m1, mcp(condition="Tukey")))

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glmer(formula = AOI ~ (1 | trial) + (1 | target_item) + condition + 
##     target_gender + condition:target_gender, data = statzdat, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa"))
## 
## Linear Hypotheses:
##                         Estimate Std. Error z value Pr(>|z|)  
## different - color == 0 2.3636688  1.0573990 2.23536 0.064815 .
## same - color == 0      2.8272882  1.2887571 2.19381 0.071499 .
## same - different == 0  0.4636193  1.2544972 0.36957 0.927075  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

We now tabulate the results of the final model.

# generate summary table
sjPlot::tab_model(m0, m1)

	AOI			AOI
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	0.56	0.26 – 1.18	0.125	0.06	0.01 – 0.29	<0.001
condition [different]				10.63	1.34 – 84.45	0.025
condition [same]				16.90	1.35 – 211.28	0.028
target gender [m]				11.65	1.45 – 93.37	0.021
target gender [n]				24.82	2.03 – 303.77	0.012
condition [different] × target gender [m]				0.05	0.00 – 0.82	0.036
condition [same] × target gender [m]				0.12	0.00 – 2.92	0.191
condition [different] × target gender [n]				0.09	0.00 – 2.86	0.171
condition [same] × target gender [n]				0.01	0.00 – 0.55	0.023
Random Effects
σ²	3.29			3.29
τ₀₀	1.01 _trial			0.56 _trial
	0.98 _{target_item}			0.45 _{target_item}
ICC	0.38			0.23
N	14 _trial			14 _trial
	14 _{target_item}			14 _{target_item}
Observations	2353			2353
Marginal R² / Conditional R²	0.000 / 0.377			0.158 / 0.356

And we visualize the fixed effects.

sjPlot::plot_model(m1)

eyetrackeR

Once the data is in in a proper format, we can also use the eyetrackeR package for our analysis. The advantage of using the eyetrackeR package is that is has many in-built functions that make the analysis of eye-tracking data a lot easier. Also, there are very helpful and detailed tutorials on how to perform analyses and visualize eye-tracking data using eyetrackeR.

Before we can use the eyetrackeR package, however, we need to create certain columns in our data that the eyetrackeR package expects.

In our case, we need to create a

column specifying if a gaze was in the AOI (which we will call OnTarget)
column specifying if a gaze was not in the AOI (which we will call OffTarget).
trackloss_column (which we will call Trackloss). This column contains information about data point that we want to remove during the analysis. In our case, we will code data points that have negative x- and y-coordinates as well as data points that occurred after 4200ms as TRUE (meaning that we consider them cases of trackloss).

dat <- dat %>%
  dplyr::mutate(TrackLoss = dplyr::case_when(x_pred < 0 ~ TRUE,
                                             y_pred < 0 ~ TRUE,
                                             time_elapsed > 4200 ~ TRUE,
                                             TRUE ~ FALSE)) %>%
  dplyr::mutate(OnTarget = dplyr::case_when(AOI == 1 ~ 1,
                                            TRUE ~ 0),
                OffTarget = dplyr::case_when(AOI == 1 ~ 0,
                                            TRUE ~ 1))

Now that we have generated the required columns in our data, we can generate an eyetrackingr_data and specify the columns that the eyetraceR package wants us to specify.

data <- make_eyetrackingr_data(dat, 
                       participant_column = "participant_id",
                       trial_column = "trial",
                       time_column = "time_elapsed",
                       trackloss_column = "TrackLoss",
                       aoi_columns = c('OnTarget','OffTarget'),
                       treat_non_aoi_looks_as_missing = TRUE
)
# inspect data
head(data)

## # A tibble: 6 × 29
##   participant_id time_elapsed type       x_pred y_pred face_conf zone_name trial
##   <fct>                 <dbl> <chr>       <dbl>  <dbl>     <dbl> <chr>     <fct>
## 1 3699461                 0   prediction    753    467     0.814 <NA>      10   
## 2 3699461                17.9 prediction    771    498     0.814 <NA>      10   
## 3 3699461                33.9 prediction    843    454     0.814 <NA>      10   
## 4 3699461                49.4 prediction    654    536     0.814 <NA>      10   
## 5 3699461                65.3 prediction    619    462     0.814 <NA>      10   
## 6 3699461                85.0 prediction    633    438     0.814 <NA>      10   
## # ℹ 21 more variables: id <int>, left_bottomedge <dbl>, left_leftedge <dbl>,
## #   left_rightedge <dbl>, left_topedge <dbl>, right_bottomedge <dbl>,
## #   right_leftedge <dbl>, right_rightedge <dbl>, right_topedge <dbl>,
## #   condition <chr>, target_gender <chr>, target_position <chr>,
## #   `Zone Type` <chr>, Response <chr>, Correct <fct>, target_item <chr>,
## #   AOI <dbl>, TimeBin <dbl>, TrackLoss <lgl>, OnTarget <lgl>, OffTarget <lgl>

We can also tabulate the number of *on target* gazes that remain in the data using thetable` function.

table(data$OnTarget)

## 
## FALSE  TRUE 
##  3277  2644

In a next step, we specify the window that we want to inspect (in our case, we want to check the window starting at 1900 ms and ending at 3450 ms).

# subset to response window post word-onset
response_window <- subset_by_window(data, 
                                    window_start_time = 1900, 
                                    window_end_time = 3450, 
                                    rezero = FALSE)

We now check to see if we need to remove data points.

# analyze amount of trackloss by subjects and trials
(trackloss <- trackloss_analysis(data = response_window))

## # A tibble: 30 × 6
##    participant_id trial Samples TracklossSamples TracklossForTrial
##    <fct>          <fct>   <dbl>            <dbl>             <dbl>
##  1 3699461        10         82                0                 0
##  2 3699461        11         76                0                 0
##  3 3699461        16         90                0                 0
##  4 3699461        17         93                0                 0
##  5 3699461        3          88                0                 0
##  6 3699461        4          91                0                 0
##  7 3699461        5          92                0                 0
##  8 3699461        7          84                0                 0
##  9 3699461        8          85                0                 0
## 10 3699461        9          62                0                 0
## # ℹ 20 more rows
## # ℹ 1 more variable: TracklossForParticipant <dbl>

Remove trackloss (trial_prop_thresh greater than or equal to .25).

# remove trials with > 25% of trackloss
response_window_clean <- clean_by_trackloss(data = response_window, 
                                            trial_prop_thresh = .25)

Extract response data.

# aggregate across trials within subjects in time analysis
response <- make_time_sequence_data(response_window_clean, 
                                    time_bin_size = 50,
                                    predictor_columns = c("condition", "Correct"),
                                    aois = c("OnTarget", "OffTarget")
                            )

Visualize response data.

# visualize time results
plot(response, 
     predictor_column = "condition") + 
  theme_light() +
  coord_cartesian(ylim = c(0,1))

Citation & Session Info

Schweinberger, Martin. 2023. Processing and Analyzing Eye-Tracking Data in R. Workshop at UIT AcqVA Aurora. Tromsø: The Artic University of Norway. url: https://slcladal.github.io/eyews.html (Version 2023.06.02).

@manual{schweinberger2023eyews,
  author = {Schweinberger, Martin},
  title = {Processing and Analyzing Eye-Tracking Data in R},
  note = {https://slcladal.github.io/eyews.html},
  year = {2021},
  organization = "Arctic University of Norway, AcqVA Aurora Center},
  address = {Tromsø},
  edition = {2023.06.02}
}

sessionInfo()

## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.28            kableExtra_1.3.4   knitr_1.42         multcomp_1.4-23   
##  [5] TH.data_1.1-1      MASS_7.3-60        survival_3.5-5     mvtnorm_1.1-3     
##  [9] sjPlot_2.8.14      lme4_1.1-33        Matrix_1.5-4.1     itsadug_2.4.1     
## [13] plotfunctions_1.4  mgcv_1.8-42        nlme_3.1-162       data.table_1.14.4 
## [17] eyetrackingR_0.2.0 lubridate_1.9.2    forcats_1.0.0      stringr_1.5.0     
## [21] dplyr_1.1.2        purrr_1.0.1        readr_2.1.4        tidyr_1.3.0       
## [25] tibble_3.2.1       ggplot2_3.4.2      tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.5         colorspace_2.1-0    deldir_1.0-6       
##   [4] ellipsis_0.3.2      sjlabelled_1.2.0    rprojroot_2.0.3    
##   [7] snakecase_0.11.0    htmlTable_2.4.1     estimability_1.4.1 
##  [10] parameters_0.20.2   base64enc_0.1-3     rstudioapi_0.14    
##  [13] farver_2.1.1        bit64_4.0.5         fansi_1.0.4        
##  [16] xml2_1.3.3          codetools_0.2-19    splines_4.2.2      
##  [19] cachem_1.0.6        sjmisc_2.8.9        Formula_1.2-4      
##  [22] jsonlite_1.8.4      nloptr_2.0.3        ggeffects_1.1.5    
##  [25] broom_1.0.3         cluster_2.1.4       png_0.1-8          
##  [28] effectsize_0.8.3    compiler_4.2.2      httr_1.4.4         
##  [31] sjstats_0.18.2      emmeans_1.8.4-1     backports_1.4.1    
##  [34] fastmap_1.1.0       lazyeval_0.2.2      cli_3.6.0          
##  [37] htmltools_0.5.4     tools_4.2.2         coda_0.19-4        
##  [40] gtable_0.3.1        glue_1.6.2          Rcpp_1.0.10        
##  [43] cellranger_1.1.0    jquerylib_0.1.4     vctrs_0.6.2        
##  [46] svglite_2.1.1       insight_0.19.1      xfun_0.39          
##  [49] rvest_1.0.3         timechange_0.1.1    lifecycle_1.0.3    
##  [52] zoo_1.8-11          scales_1.2.1        vroom_1.6.1        
##  [55] ragg_1.2.5          hms_1.1.2           parallel_4.2.2     
##  [58] sandwich_3.0-2      RColorBrewer_1.1-3  yaml_2.3.7         
##  [61] gridExtra_2.3       sass_0.4.5          rpart_4.1.19       
##  [64] latticeExtra_0.6-30 stringi_1.7.12      highr_0.10         
##  [67] bayestestR_0.13.0   checkmate_2.1.0     boot_1.3-28.1      
##  [70] rlang_1.1.1         pkgconfig_2.0.3     systemfonts_1.0.4  
##  [73] evaluate_0.20       lattice_0.21-8      labeling_0.4.2     
##  [76] htmlwidgets_1.6.1   bit_4.0.5           tidyselect_1.2.0   
##  [79] here_1.0.1          magrittr_2.0.3      R6_2.5.1           
##  [82] generics_0.1.3      Hmisc_4.8-0         foreign_0.8-84     
##  [85] pillar_1.9.0        withr_2.5.0         nnet_7.3-19        
##  [88] datawizard_0.6.5    performance_0.10.2  modelr_0.1.10      
##  [91] crayon_1.5.2        interp_1.1-3        utf8_1.2.3         
##  [94] tzdb_0.3.0          rmarkdown_2.20      jpeg_0.1-10        
##  [97] grid_4.2.2          readxl_1.4.2        digest_0.6.31      
## [100] webshot_0.5.4       xtable_1.8-4        textshaping_0.3.6  
## [103] munsell_0.5.0       viridisLite_0.4.1   bslib_0.4.2

References

Field, Andy, Jeremy Miles, and Zoe Field. 2012. Discovering Statistics Using r. Sage.

Gries, Stefan Th. 2021. Statistics for Linguistics Using r: A Practical Introduction. Berlin & New York: Mouton de Gruyter.

Winter, Bodo. 2019. Statistics for Linguists: An Introduction Using r. Routledge.

Processing and Analyzing Eye-Tracking Data in R

Martin Schweinberger

2023-06-02