Introduction

This workshop introduces data processing and analysis for eye-tracking data in R. The RMarkdown document for the tutorial can be downloaded here and the bib library here. You can also download a shortened version of the RMarkdown document with only contains the processing chain here (here is the link to the html file of the document). You will find very helpful and detailed tutorials on how to perform analyses and visualize eye-tracking data using eyetrackeR here.

We will go through the following steps:

We will not address issues relating to adequate sample size and power. If you are interested in that, please check out this tutorial on the Language Technology and Data Analysis Laboratory website.

Preparation

This tutorial is based on R. If you have not installed R or RStudio or if you are new to either of them, you will find an introduction to and more information how to use R and RStudio here. For this tutorials, we need to install certain packages into an R library on our computer so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

install.packages(c("tidyverse", "eyetrackingR", "data.table", "itsadug", "sjPlot", "lme4", "multcomp")

Once you have installed R and RStudio and initiated the session by executing the code shown above, you are good to go.

Once you have installed the packages, please load them and set useful options as shown below.

# set options
options(stringsAsFactors = F)          # no automatic data transformation
options("scipen" = 100, "digits" = 10) # suppress math annotation
# load packages
library(tidyverse)
library(eyetrackingR)
library(data.table)
library(itsadug)
library(lme4)
library(sjPlot)
library(multcomp)

Now that we have prepared out session, we can start with the data processing.

Data processing

During data processing, we load and prepare the data for further analysis and visualization.

Define paths

In a first step, we define the paths to the spreadsheets (datapath) and to the masterfile (csv file with information about the experiment). In my case, I have the spreadsheets in a folder called uploads which is a folder called data_exp_50674-v2 in my data folder. The masterfile is also in the folder called data_exp_50674-v2 in my data folder but it is not in my uploads folder.

datapath <- here::here("data/data_exp_50674-v2", "uploads")
masterpath <- here::here("data/data_exp_50674-v2", "data_exp_50674-v2_task-etfm.csv")

Now that we have defined the paths, we continue.

Load data

In a first step we want to load the data which in our case consists of several spreadsheets (files ending in xlsx).

We begin by extracting a list of these xlsx files (the paths where these files are located on your computer).

fls <- list.files(datapath, full.names = T)
fls <- fls[2:length(fls)]
# inspect files
head(fls)
## [1] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx" 
## [2] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-10-2.xlsx"
## [3] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-11-2.xlsx"
## [4] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-12-2.xlsx"
## [5] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-13-2.xlsx"
## [6] "D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-14-2.xlsx"

Next, we can use this list (the paths) to load the files. In addition, we will also create two new columns: a column called idname which contains the path to the file and a column called trial which tells us what trial the data is from.

datls <- lapply(fls, function(x){
  name <- x
  x <- readxl::read_xlsx(x) %>%
    # create id column (contains path)
    dplyr::mutate(idname = name) %>%
    # code trial
    dplyr::mutate(trial = stringr::str_remove_all(name, ".*collection-"))  %>%
    dplyr::mutate(trial = stringr::str_remove_all(trial, "-.*"))
  })
# inspect data
head(datls[1])
## [[1]]
## # A tibble: 295 x 25
##    `0`   filename participant_id spreadsheet_row time_stamp time_elapsed type 
##    <lgl> <chr>             <dbl>           <dbl>      <dbl>        <dbl> <chr>
##  1 NA    eyetrac~        3699461              16    1.62e12          0   new ~
##  2 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  3 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  4 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  5 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  6 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  7 NA    eyetrac~        3699461              16    1.62e12          0   zone 
##  8 NA    eyetrac~        3699461              16    1.62e12          0   pred~
##  9 NA    eyetrac~        3699461              16    1.62e12         17.3 pred~
## 10 NA    eyetrac~        3699461              16    1.62e12         33.3 pred~
## # ... with 285 more rows, and 18 more variables: screen_index <dbl>,
## #   x_pred <dbl>, y_pred <dbl>, x_pred_normalised <dbl>,
## #   y_pred_normalised <dbl>, convergence <dbl>, face_conf <dbl>,
## #   zone_name <chr>, zone_x <dbl>, zone_y <dbl>, zone_width <dbl>,
## #   zone_height <dbl>, zone_x_normalised <dbl>, zone_y_normalised <dbl>,
## #   zone_width_normalised <dbl>, zone_height_normalised <dbl>, idname <chr>,
## #   trial <chr>

We can now merge all the spreadsheets into one file and also add a column called id that gives each row a unique identifier. Furthermore, we convert the participant_id and the trial column into factors.

edat <- data.table::rbindlist(datls) %>%
  # add id
  dplyr::mutate(id = 1:nrow(.)) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial))
First 6 rows of edat.
0 filename participant_id spreadsheet_row time_stamp time_elapsed type screen_index x_pred y_pred x_pred_normalised y_pred_normalised convergence face_conf zone_name zone_x zone_y zone_width zone_height zone_x_normalised zone_y_normalised zone_width_normalised zone_height_normalised idname trial id
NA eyetracking_collection 3699461 16 1619598396444 0 new collection screen 2 0 0 0 0 0 0 NA 0.00000 0 0 0 0.0000000000 0.000000000 0.0000000000 0.0000000000 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 1
NA eyetracking_collection 3699461 16 1619598396444 0 zone 2 0 0 0 0 0 0 screen 0.00000 0 1440 821 -0.1576769406 0.000000000 1.3150684932 1.0000000000 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 2
NA eyetracking_collection 3699461 16 1619598396445 0 zone 2 0 0 0 0 0 0 gorilla 172.65625 0 1095 821 0.0000000000 0.000000000 1.0000000000 1.0000000000 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 3
NA eyetracking_collection 3699461 16 1619598396445 0 zone 2 0 0 0 0 0 0 Zone1 172.65625 772 55 49 0.0000000000 0.940316687 0.0502283105 0.0596833130 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 4
NA eyetracking_collection 3699461 16 1619598396445 0 zone 2 0 0 0 0 0 0 Right 774.65625 246 438 320 0.5497716895 0.299634592 0.4000000000 0.3897685749 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 5
NA eyetracking_collection 3699461 16 1619598396445 0 zone 2 0 0 0 0 0 0 Left 227.65625 246 438 329 0.0502283105 0.299634592 0.4000000000 0.4007308161 D:/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx 1 6

Determine image boundaries

Non-normalized

In a next step, we define image boundaries. In our example, we are dealing with two images: one to the right and one to the left. We now use the zone_y, zone_x, zone_height, and zone_width columns to calculate the edges of the images.

  • top (upper border) = 246 (zone_y)

  • bottom (lower border) = 246 + 329 (zone_y + zone_height)

  • left (left border) = 774 (zone_x)

  • right (right border) = 774 + 438 (zone_x + zone_width)

ibs <- edat %>%
  dplyr::select(participant_id, trial, zone_name, zone_x, zone_y,zone_width, zone_height) %>%
  # get rid of superfluous rows
  dplyr::filter(zone_name == "Right"|zone_name == "Left") %>%
  na.omit() %>%
  # define image boundaries
  dplyr::mutate(top = zone_y,
                bottom = zone_y + zone_width,
                left = zone_x,
                right = zone_x + zone_width) %>%
  # remove superfluous columns
  dplyr::select(-zone_x, -zone_y, -zone_width, -zone_height)
First 10 rows of ibs.
participant_id trial zone_name top bottom left right
3699461 1 Right 246 684 774.656250 1212.656250
3699461 1 Left 246 684 227.656250 665.656250
3699461 10 Right 246 684 774.656250 1212.656250
3699461 10 Left 246 684 227.656250 665.656250
3699461 11 Right 246 684 774.671875 1212.671875
3699461 11 Left 246 684 227.671875 665.671875
3699461 12 Right 246 684 774.671875 1212.671875
3699461 12 Left 246 684 227.671875 665.671875
3699461 13 Right 246 684 774.656250 1212.656250
3699461 13 Left 246 684 227.656250 665.656250

Normalized

We can also use the normalized values.


If you do this, then it is very crucial, that you use x_pred_normalised and y_pred_normalised and not x_pred and y_pred in your analysis!


In a next step, we define image boundaries. In our example, we are dealing with two images: one to the right and one to the left. We now use the zone_y_normalised, zone_x_normalised, zone_height_normalised, and zone_width_normalised columns to calculate the edges of the images.

  • top (upper border) = 246 (zone_y_normalised)

  • bottom (lower border) = 246 + 329 (zone_y_normalised + zone_height_normalised)

  • left (left border) = 774 (zone_x_normalised)

  • right (right border) = 774 + 438 (zone_x_normalised + zone_width_normalised)

ibs_norm <- edat %>%
  dplyr::select(participant_id, trial, zone_name, zone_x_normalised, 
                zone_y_normalised, zone_width_normalised, zone_height_normalised) %>%
  # get rid of superfluous rows
  dplyr::filter(zone_name == "Right"|zone_name == "Left") %>%
  na.omit() %>%
  # define image boundaries
  dplyr::mutate(top = zone_y_normalised,
                bottom = zone_y_normalised + zone_width_normalised,
                left = zone_x_normalised,
                right = zone_x_normalised + zone_width_normalised) %>%
  # remove superfluous columns
  dplyr::select(-zone_x_normalised, -zone_y_normalised, -zone_width_normalised, -zone_height_normalised)
First 10 rows of ibs.
participant_id trial zone_name top bottom left right
3699461 1 Right 0.299634592 0.699634592 0.5497716895 0.9497716895
3699461 1 Left 0.299634592 0.699634592 0.0502283105 0.4502283105
3699461 10 Right 0.299634592 0.699634592 0.5497716895 0.9497716895
3699461 10 Left 0.299634592 0.699634592 0.0502283105 0.4502283105
3699461 11 Right 0.299634592 0.699634592 0.5497716895 0.9497716895
3699461 11 Left 0.299634592 0.699634592 0.0502283105 0.4502283105
3699461 12 Right 0.299634592 0.699634592 0.5497716895 0.9497716895
3699461 12 Left 0.299634592 0.699634592 0.0502283105 0.4502283105
3699461 13 Right 0.299634592 0.699634592 0.5497716895 0.9497716895
3699461 13 Left 0.299634592 0.699634592 0.0502283105 0.4502283105

Adding edges

We now transform the data so that we have the information about the edges in separate columns. Thus, we have four columns for the right and the left image: bottomedge, leftedge, rightedge, and topedge.

ibs <- ibs %>%
  dplyr::mutate(position = tolower(zone_name)) %>%
  tidyr::gather(edge, coordinate, top:right) %>%
  dplyr::mutate(position_edge = paste0(position, "_", edge, "edge")) %>%
  dplyr::select(-zone_name, -position, -edge) %>%
  tidyr::spread(position_edge, coordinate) %>%
  dplyr::group_by(participant_id, trial) %>%
  dplyr::summarise(left_bottomedge = left_bottomedge,
                   left_leftedge = left_leftedge,
                   left_rightedge = left_rightedge,
                   left_topedge = left_topedge,
                   right_bottomedge = right_bottomedge,
                   right_leftedge = right_leftedge,
                   right_rightedge = right_rightedge,
                   right_topedge = right_topedge)
First 10 rows of ibs.
participant_id trial left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge
3699461 1 684 227.656250 665.656250 246 684 774.656250 1212.656250 246
3699461 10 684 227.656250 665.656250 246 684 774.656250 1212.656250 246
3699461 11 684 227.671875 665.671875 246 684 774.671875 1212.671875 246
3699461 12 684 227.671875 665.671875 246 684 774.671875 1212.671875 246
3699461 13 684 227.656250 665.656250 246 684 774.656250 1212.656250 246
3699461 14 684 227.671875 665.671875 246 684 774.671875 1212.671875 246
3699461 15 684 227.671875 665.671875 246 684 774.671875 1212.671875 246
3699461 16 684 227.656250 665.656250 246 684 774.656250 1212.656250 246
3699461 17 684 227.671875 665.671875 246 684 774.671875 1212.671875 246
3699461 18 684 227.671875 665.671875 246 684 774.671875 1212.671875 246

Process master file

Now that we have processed the data and defined the image boundaries, we load the master file. The master file contains information about the experiment, the individual trials, and the computer and browser used by the participant. We now load master file (in our example this is called data_exp_50674-v2_task-etfm.csv) from the folder called data_exp_50674-v2 which is located in the data folder.

mstr <- read_csv(masterpath) %>%
  # create participant column that matches the participant column in the data 
  dplyr::mutate(participant_id = `Participant Private ID`)
First 10 rows of mstr.
Event Index UTC Timestamp UTC Date Local Timestamp Local Timezone Local Date Experiment ID Experiment Version Tree Node Key Repeat Key Schedule ID Participant Public ID Participant Private ID Participant Starting Group Participant Status Participant Completion Code Participant External Session ID Participant Device Type Participant Device Participant OS Participant Browser Participant Monitor Size Participant Viewport Size Checkpoint Task Name Task Version Spreadsheet Spreadsheet Name Spreadsheet Row Trial Number Screen Number Screen Name Zone Name Zone Type Reaction Time Reaction Onset Response Type Response Attempt Correct Incorrect Dishonest X Coordinate Y Coordinate Timed Out randomise_blocks randomise_trials display ANSWER audio_file picture_left picture_right trial number condition target_gender target_item target_position competitor competitor_item competitor_position color participant_id
1 1619598327535 28/04/2021 08:25:27 1619598327448 2 28/04/2021 10:25:27 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 NA NA BEGIN TASK NA NA NA NA NA NA NA NA NA 0 1 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3699461
2 1619598392107 28/04/2021 08:26:32 1619598392021 2 28/04/2021 10:26:32 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 1 1 1 Screen 1 Zone1 eye_tracking NA NA calibration succeeded 0 NA 0 1 0 NA NA NA NA NA Calibration NA NA NA NA NA NA NA NA NA NA NA NA NA 3699461
3 1619598396097 28/04/2021 08:26:36 1619598396012 2 28/04/2021 10:26:36 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 1 1 1 Screen 1 Zone1 eye_tracking 68528.495 NA NA NA NA 0 1 0 NA NA NA NA NA Calibration NA NA NA NA NA NA NA NA NA NA NA NA NA 3699461
4 1619598396489 28/04/2021 08:26:36 1619598396412 2 28/04/2021 10:26:36 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 1 Screen 2 Zone1 fixation 449.982 NA NA NA NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
5 1619598396597 28/04/2021 08:26:36 1619598396446 2 28/04/2021 10:26:36 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone2 content_web_audio 22.245 NA NA AUDIO PLAY REQUESTED NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
6 1619598401674 28/04/2021 08:26:41 1619598401595 2 28/04/2021 10:26:41 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone1 eye_tracking NA NA Left Time 2209.4900000083726 NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
7 1619598401780 28/04/2021 08:26:41 1619598401598 2 28/04/2021 10:26:41 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone1 eye_tracking NA NA Left Percent 43 NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
8 1619598401780 28/04/2021 08:26:41 1619598401598 2 28/04/2021 10:26:41 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone1 eye_tracking NA NA Right Time 2901.96499999729 NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
9 1619598401780 28/04/2021 08:26:41 1619598401599 2 28/04/2021 10:26:41 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone1 eye_tracking NA NA Right Percent 57 NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461
10 1619598401780 28/04/2021 08:26:41 1619598401599 2 28/04/2021 10:26:41 50674 2 task-etfm NA 12247303 BLIND 3699461 NA complete NA NA computer Desktop or Laptop Mac OS 10.14.6 Chrome 89.0.4389.128 1440x900 1440x821 NA Eyetracking_Sample 3 Spreadsheet1 Spreadsheet1 16 1 2 Screen 1 Zone1 eye_tracking NA NA A Time 1811.7099999799393 NA 0 1 0 NA NA NA NA 1 Eyetracking train.png NA train.png cupboard.png 15 color m train left m cupboard right black & brown 3699461

The master file contains a lot of information. As retaining columns with unnecessary information renders it difficult to parse and work with data, we remove columns that we do not need. We would like to retain the following columns though: participant_id, trial_number, condition, target_gender, target_position, Zone Type, Response, and Correct.

mstr_redux <- mstr %>%
  # select columns you need
  dplyr::select(participant_id, `trial number`, condition, target_gender, 
                target_position, `Zone Type`, Response, Correct, target_item) %>%
  # filter unique 
  unique() %>%
  # remove rows containing NA
  na.omit() %>%
  # filter out superfluous rows
  dplyr::filter(`Zone Type` == "response_button_image") %>%
  dplyr::rename(trial = `trial number`) %>%
  dplyr::mutate(trial = as.character(trial)) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial))
First 10 rows of mstr_redux.
participant_id trial condition target_gender target_position Zone Type Response Correct target_item
3699461 15 color m left response_button_image cupboard.png 0 train
3699461 2 same f right response_button_image carrot.png 0 rose
3699461 6 same n right response_button_image egg.png 0 piano
3699461 13 color f left response_button_image computer.png 0 hat
3699461 14 color f right response_button_image chair.png 0 church
3699461 8 differernt f right response_button_image banana.png 1 banana
3699461 5 same n left response_button_image bed.png 1 bed
3699461 12 different n right response_button_image door.png 0 bicycle
3699461 9 differernt m left response_button_image cheese.png 1 cheese
3699461 1 same f left response_button_image cup.png 0 fork

Join data and masterfile

The next step consists in joining (or merging) the data (edat) with the information about the image boundaries (ibs). Before joining these two data sets, we will clean the dat in the edat file by

  • removing columns we do not need

  • factorizing participants and trials

  • removing rows without gaze information

edat <- edat %>%
  # remove superfluous columns
  dplyr::select(-`0`, -filename, -spreadsheet_row, -time_stamp, -screen_index, 
                -convergence, -zone_x, -zone_y, -zone_width,
                -zone_height, -zone_x_normalised, -zone_y_normalised, -zone_width_normalised,
                -zone_height_normalised, -idname, 
                
                # WARNING: If you work with normalized values, REPLACE the following 
                # with their non-normalized counterparts!
                
                -x_pred_normalised, -y_pred_normalised) %>%
  # convert participant_id and trial into factors
  dplyr::mutate(participant_id = factor(participant_id),
                trial = factor(trial)) %>%
  # remove rows without gaze information
  dplyr::filter(x_pred != 0,
                y_pred != 0)
First 10 rows of edat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12
3699461 83.01000000 prediction 797 676 0.8164998461 NA 1 13
3699461 99.08499999 prediction 740 631 0.8164998461 NA 1 14
3699461 114.96000001 prediction 846 603 0.8164998461 NA 1 15
3699461 131.33000000 prediction 747 642 0.8164998461 NA 1 16
3699461 148.24499999 prediction 596 426 0.8162385964 NA 1 17

Now, we can join (or merge) edat with ibs (image boundaries).

edatibs <- left_join(edat, ibs, by = c("participant_id", "trial"))
First 10 rows of edatibs.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 83.01000000 prediction 797 676 0.8164998461 NA 1 13 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 99.08499999 prediction 740 631 0.8164998461 NA 1 14 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 114.96000001 prediction 846 603 0.8164998461 NA 1 15 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 131.33000000 prediction 747 642 0.8164998461 NA 1 16 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 148.24499999 prediction 596 426 0.8162385964 NA 1 17 684 227.65625 665.65625 246 684 774.65625 1212.65625 246

Remove imprecision

It is also useful, to remove data points that have low precision. We thus remove data points with a predicted accuracy lower than .5 (face_conf should greater than .5).

edatibs <- edatibs %>%
  # filter imprecise data points
  dplyr::filter(face_conf >= .5)
# inspect
head(edatibs)
##    participant_id time_elapsed       type x_pred y_pred    face_conf zone_name
## 1:        3699461   0.00000000 prediction    618    533 0.8164998461      <NA>
## 2:        3699461  17.32499999 prediction    604    416 0.8164998461      <NA>
## 3:        3699461  33.34500000 prediction    558    477 0.8164998461      <NA>
## 4:        3699461  48.51500000 prediction    791    753 0.8164998461      <NA>
## 5:        3699461  65.08999999 prediction    783    680 0.8164998461      <NA>
## 6:        3699461  83.01000000 prediction    797    676 0.8164998461      <NA>
##    trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1:     1  8             684     227.65625      665.65625          246
## 2:     1  9             684     227.65625      665.65625          246
## 3:     1 10             684     227.65625      665.65625          246
## 4:     1 11             684     227.65625      665.65625          246
## 5:     1 12             684     227.65625      665.65625          246
## 6:     1 13             684     227.65625      665.65625          246
##    right_bottomedge right_leftedge right_rightedge right_topedge
## 1:              684      774.65625      1212.65625           246
## 2:              684      774.65625      1212.65625           246
## 3:              684      774.65625      1212.65625           246
## 4:              684      774.65625      1212.65625           246
## 5:              684      774.65625      1212.65625           246
## 6:              684      774.65625      1212.65625           246
First 5 rows of edatibs.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246

Combining the data and metadata

Now we combine the collected data (edatibs = edat plus image boundaries ibs) with the metadata (the information from the reduced master file mstr_redux)

dat <- dplyr::left_join(edatibs, mstr_redux, by = c("participant_id", "trial"))
# inspect
head(dat)
##    participant_id time_elapsed       type x_pred y_pred    face_conf zone_name
## 1:        3699461   0.00000000 prediction    618    533 0.8164998461      <NA>
## 2:        3699461  17.32499999 prediction    604    416 0.8164998461      <NA>
## 3:        3699461  33.34500000 prediction    558    477 0.8164998461      <NA>
## 4:        3699461  48.51500000 prediction    791    753 0.8164998461      <NA>
## 5:        3699461  65.08999999 prediction    783    680 0.8164998461      <NA>
## 6:        3699461  83.01000000 prediction    797    676 0.8164998461      <NA>
##    trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1:     1  8             684     227.65625      665.65625          246
## 2:     1  9             684     227.65625      665.65625          246
## 3:     1 10             684     227.65625      665.65625          246
## 4:     1 11             684     227.65625      665.65625          246
## 5:     1 12             684     227.65625      665.65625          246
## 6:     1 13             684     227.65625      665.65625          246
##    right_bottomedge right_leftedge right_rightedge right_topedge condition
## 1:              684      774.65625      1212.65625           246      same
## 2:              684      774.65625      1212.65625           246      same
## 3:              684      774.65625      1212.65625           246      same
## 4:              684      774.65625      1212.65625           246      same
## 5:              684      774.65625      1212.65625           246      same
## 6:              684      774.65625      1212.65625           246      same
##    target_gender target_position             Zone Type Response Correct
## 1:             f            left response_button_image  cup.png       0
## 2:             f            left response_button_image  cup.png       0
## 3:             f            left response_button_image  cup.png       0
## 4:             f            left response_button_image  cup.png       0
## 5:             f            left response_button_image  cup.png       0
## 6:             f            left response_button_image  cup.png       0
##    target_item
## 1:        fork
## 2:        fork
## 3:        fork
## 4:        fork
## 5:        fork
## 6:        fork
First 10 rows of dat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge condition target_gender target_position Zone Type Response Correct target_item
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 83.01000000 prediction 797 676 0.8164998461 NA 1 13 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 99.08499999 prediction 740 631 0.8164998461 NA 1 14 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 114.96000001 prediction 846 603 0.8164998461 NA 1 15 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 131.33000000 prediction 747 642 0.8164998461 NA 1 16 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork
3699461 148.24499999 prediction 596 426 0.8162385964 NA 1 17 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork

AOI

We now use the edges of the images to determine if gazes were in the AOI

dat <- dat %>%
  # determine if participant's gaze was in AOI
  dplyr::mutate(AOI = ifelse(
    # if target is left image
    target_position == "left" &
      y_pred > left_topedge & 
      y_pred < left_bottomedge & 
      x_pred > left_leftedge & 
      x_pred < left_rightedge, 1,
    # if target is right image
    ifelse(target_position == "right" &
             y_pred > right_topedge   & 
             y_pred < right_bottomedge &
             x_pred  > right_leftedge & 
             x_pred <  right_rightedge, 1,
           0)))
First 5 rows of dat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge condition target_gender target_position Zone Type Response Correct target_item AOI
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 0
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 0

Binning

Define time bins (here 100 ms)

dat <- dat %>%
  # arrange by participant, trial, and time
  dplyr::arrange(participant_id, trial, time_elapsed) %>%
  # bin times into .2 time bins
  dplyr::mutate(TimeBin = itsadug::timeBins(time_elapsed, 100, pos=0))
First 5 rows of dat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge condition target_gender target_position Zone Type Response Correct target_item AOI TimeBin
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1 0
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1 0
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 1 0
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 0 0
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png 0 fork 0 0

Cleaning

dat <- dat %>%
  # clean condition (spelling error!)
  dplyr::mutate(condition = dplyr::case_when(condition == "color" ~ "color",
                                             condition == "same" ~ "same",
                                             condition == "different" ~ "different",
                                             condition == "differernt" ~ "different",
                                             TRUE ~ condition)) %>%
  # change correct from 0 vs 1 into correct vs incorrect
  dplyr::mutate(Correct = ifelse(Correct == 1, "Correct",
                                 ifelse(Correct == 0, "Incorrect", Correct)),
                Correct = factor(Correct))
First 5 rows of dat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge condition target_gender target_position Zone Type Response Correct target_item AOI TimeBin
3699461 0.00000000 prediction 618 533 0.8164998461 NA 1 8 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png Incorrect fork 1 0
3699461 17.32499999 prediction 604 416 0.8164998461 NA 1 9 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png Incorrect fork 1 0
3699461 33.34500000 prediction 558 477 0.8164998461 NA 1 10 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png Incorrect fork 1 0
3699461 48.51500000 prediction 791 753 0.8164998461 NA 1 11 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png Incorrect fork 0 0
3699461 65.08999999 prediction 783 680 0.8164998461 NA 1 12 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 same f left response_button_image cup.png Incorrect fork 0 0

Getting rid of incorrect observations

dat <- dat %>%
  dplyr::filter(Correct != "Incorrect")
First 5 rows of dat.
participant_id time_elapsed type x_pred y_pred face_conf zone_name trial id left_bottomedge left_leftedge left_rightedge left_topedge right_bottomedge right_leftedge right_rightedge right_topedge condition target_gender target_position Zone Type Response Correct target_item AOI TimeBin
3699461 0.00000000 prediction 753 467 0.8142243928 NA 10 303 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 different m right response_button_image tree.png Correct tree 0 0
3699461 17.86500000 prediction 771 498 0.8142243928 NA 10 304 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 different m right response_button_image tree.png Correct tree 0 0
3699461 33.91999999 prediction 843 454 0.8142243928 NA 10 305 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 different m right response_button_image tree.png Correct tree 1 0
3699461 49.38000001 prediction 654 536 0.8142243928 NA 10 306 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 different m right response_button_image tree.png Correct tree 0 0
3699461 65.33499999 prediction 619 462 0.8142243928 NA 10 307 684 227.65625 665.65625 246 684 774.65625 1212.65625 246 different m right response_button_image tree.png Correct tree 0 0

#’ Saving the data{-}

You can now save the data in your data folder, if you like.

write.table(dat, here::here("data", "dat.txt"), sep = "\t", row.names = F)

To re-load this data, you would have use the following command:

reload <- read.delim(here::here("data", "dat.txt"), sep = "\t")
# inspect
reload[1:4, 1:4]
##   participant_id time_elapsed       type x_pred
## 1        3699461   0.00000000 prediction    753
## 2        3699461  17.86500000 prediction    771
## 3        3699461  33.91999999 prediction    843
## 4        3699461  49.38000001 prediction    654

Data Viz

Prepare data for a visualization

f1 <- dat %>%
  # remove "weird" data points
  dplyr::filter(x_pred > 0,
                y_pred > 0,
                time_elapsed < 4200) %>%
  # grouping
  dplyr::group_by(condition, TimeBin, Correct, target_gender) %>%
  # summarise: calculate proportion of looks in AOI
  dplyr::summarise(Proportion = mean(AOI))
# inspect data
head(f1, 10)
## # A tibble: 10 x 5
## # Groups:   condition, TimeBin, Correct [4]
##    condition TimeBin Correct target_gender Proportion
##    <chr>       <dbl> <fct>   <chr>              <dbl>
##  1 color           0 Correct f                  0.278
##  2 color           0 Correct m                  0.429
##  3 color           0 Correct n                  0.545
##  4 color         100 Correct f                  0.353
##  5 color         100 Correct m                  0.407
##  6 color         100 Correct n                  1    
##  7 color         200 Correct f                  0.353
##  8 color         200 Correct m                  0.379
##  9 color         200 Correct n                  1    
## 10 color         300 Correct f                  0.312
First 5 rows of f1.
condition TimeBin Correct target_gender Proportion
color 0 Correct f 0.2777777778
color 0 Correct m 0.4285714286
color 0 Correct n 0.5454545455
color 100 Correct f 0.3529411765
color 100 Correct m 0.4074074074

Line plot

ggplot(f1, aes(y = Proportion, x = TimeBin, color = condition)) +
  # lines for proportions
  geom_line() +
  # add vertical line
  geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
  # add vertical line
  geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
  # add text
  ggplot2::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +
  # separate panels for each target_gender
  facet_grid(target_gender ~ .) +
  # black and white theme
  theme_bw() + 
  # no grid lines
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.position = "top",
        # define x-axis tick labels
        axis.text.x = element_text(angle = 45, vjust=0.6, size = 10)) +
  # define x-axis
  scale_x_continuous(name = "Time in Trial (ms)", 
                     limits = c(0,4000), 
                     breaks = seq(0,4000,1000), 
                     labels = seq(0, 4000, 1000)) +
  # define y-axis
  scale_y_continuous(name = "Proportion in AOI", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1,.2), 
                     labels = seq(0, 1, .2)) +
  # save plot
  ggsave(file = here::here("images","Fig01.png"), 
       height = 5,  width = 10,  dpi = 320)

Lineplot with errorbars

# scatter plot with error bars
ggplot(dat, aes(x=TimeBin, y= AOI,  group = condition, color = condition)) +                 
  stat_summary(fun = mean, geom = "line", aes(group= condition, color = condition)) +          
  stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
  # add vertical line
  geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
  # add vertical line
  geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
  # add text
  ggplot2::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +            
  # def. font size
  theme_bw(base_size = 15) +  
  theme(axis.text.x = element_text(size=10, angle = 90),  
        axis.text.y = element_text(size=10, face="plain"),
        legend.position = "top",
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) +  
  # define x-axis
  scale_x_continuous(name = "Time in Trial (ms)", 
                     limits = c(0,<