This workshop introduces data processing and analysis for
eye-tracking data in R. The RMarkdown document for the tutorial can be
downloaded here and
the bib library here. You can
also download a shortened version of the RMarkdown document with only
contains the processing chain here (here is the link to the
html file of the document). You will find very helpful and detailed
tutorials on how to perform analyses and visualize eye-tracking data
using eyetrackeR
here.
We will go through the following steps:
eyetrackeR
package for data visualization and
analysisWe will not address issues relating to adequate sample size and power. If you are interested in that, please check out this tutorial on the Language Technology and Data Analysis Laboratory website.
This tutorial is based on R. If you have not installed R or RStudio or if you are new to either of them, you will find an introduction to and more information how to use R and RStudio here. For this tutorials, we need to install certain packages into an R library on our computer so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).
install.packages(c("tidyverse", "eyetrackingR", "data.table", "itsadug", "sjPlot", "lme4", "multcomp"))
Once you have installed R and RStudio and initiated the session by executing the code shown above, you are good to go.
Once you have installed the packages, please load them and set useful options as shown below.
# set options
options(stringsAsFactors = F) # no automatic data transformation
options("scipen" = 100, "digits" = 10) # suppress math annotation
# load packages
library(tidyverse)
library(eyetrackingR)
library(data.table)
library(itsadug)
library(lme4)
library(sjPlot)
library(multcomp)
Now that we have prepared out session, we can start with the data processing.
During data processing, we load and prepare the data for further analysis and visualization.
In a first step, we define the paths to the spreadsheets (datapath)
and to the masterfile (csv file with information about the experiment).
In my case, I have the spreadsheets in a folder called
uploads
which is a folder called
data_exp_50674-v2
in my data
folder. The
masterfile is also in the folder called data_exp_50674-v2
in my data
folder but it is not in my uploads
folder.
<- here::here("data/data_exp_50674-v2", "uploads")
datapath <- here::here("data/data_exp_50674-v2", "data_exp_50674-v2_task-etfm.csv") masterpath
Now that we have defined the paths, we continue.
In a first step we want to load the data which in our case consists
of several spreadsheets (files ending in xlsx
).
We begin by extracting a list of these xlsx files (the paths where these files are located on your computer).
<- list.files(datapath, full.names = T)
fls <- fls[2:length(fls)]
fls # inspect files
head(fls)
## [1] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx"
## [2] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-10-2.xlsx"
## [3] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-11-2.xlsx"
## [4] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-12-2.xlsx"
## [5] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-13-2.xlsx"
## [6] "F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-14-2.xlsx"
Next, we can use this list (the paths) to load the files. In
addition, we will also create two new columns: a column called
idname
which contains the path to the file and a column
called trial
which tells us what trial the data is
from.
<- lapply(fls, function(x){
datls <- x
name <- readxl::read_xlsx(x) %>%
x # create id column (contains path)
::mutate(idname = name) %>%
dplyr# code trial
::mutate(trial = stringr::str_remove_all(name, ".*collection-")) %>%
dplyr::mutate(trial = stringr::str_remove_all(trial, "-.*"))
dplyr
})# inspect data
head(datls[1])
## [[1]]
## # A tibble: 295 × 25
## `0` filename participant_id spreadsheet_row time_stamp time_elapsed type
## <lgl> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 NA eyetracki… 3699461 16 1.62e12 0 new …
## 2 NA eyetracki… 3699461 16 1.62e12 0 zone
## 3 NA eyetracki… 3699461 16 1.62e12 0 zone
## 4 NA eyetracki… 3699461 16 1.62e12 0 zone
## 5 NA eyetracki… 3699461 16 1.62e12 0 zone
## 6 NA eyetracki… 3699461 16 1.62e12 0 zone
## 7 NA eyetracki… 3699461 16 1.62e12 0 zone
## 8 NA eyetracki… 3699461 16 1.62e12 0 pred…
## 9 NA eyetracki… 3699461 16 1.62e12 17.3 pred…
## 10 NA eyetracki… 3699461 16 1.62e12 33.3 pred…
## # ℹ 285 more rows
## # ℹ 18 more variables: screen_index <dbl>, x_pred <dbl>, y_pred <dbl>,
## # x_pred_normalised <dbl>, y_pred_normalised <dbl>, convergence <dbl>,
## # face_conf <dbl>, zone_name <chr>, zone_x <dbl>, zone_y <dbl>,
## # zone_width <dbl>, zone_height <dbl>, zone_x_normalised <dbl>,
## # zone_y_normalised <dbl>, zone_width_normalised <dbl>,
## # zone_height_normalised <dbl>, idname <chr>, trial <chr>
We can now merge all the spreadsheets into one file and also add a
column called id
that gives each row a unique identifier.
Furthermore, we convert the participant_id
and the
trial
column into factors.
<- data.table::rbindlist(datls) %>%
edat # add id
::mutate(id = 1:nrow(.)) %>%
dplyr# convert participant_id and trial into factors
::mutate(participant_id = factor(participant_id),
dplyrtrial = factor(trial))
0 | filename | participant_id | spreadsheet_row | time_stamp | time_elapsed | type | screen_index | x_pred | y_pred | x_pred_normalised | y_pred_normalised | convergence | face_conf | zone_name | zone_x | zone_y | zone_width | zone_height | zone_x_normalised | zone_y_normalised | zone_width_normalised | zone_height_normalised | idname | trial | id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NA | eyetracking_collection | 3699461 | 16 | 1619598396444 | 0 | new collection screen | 2 | 0 | 0 | 0 | 0 | 0 | 0 | NA | 0.00000 | 0 | 0 | 0 | 0.0000000000 | 0.000000000 | 0.0000000000 | 0.0000000000 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 1 |
NA | eyetracking_collection | 3699461 | 16 | 1619598396444 | 0 | zone | 2 | 0 | 0 | 0 | 0 | 0 | 0 | screen | 0.00000 | 0 | 1440 | 821 | -0.1576769406 | 0.000000000 | 1.3150684932 | 1.0000000000 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 2 |
NA | eyetracking_collection | 3699461 | 16 | 1619598396445 | 0 | zone | 2 | 0 | 0 | 0 | 0 | 0 | 0 | gorilla | 172.65625 | 0 | 1095 | 821 | 0.0000000000 | 0.000000000 | 1.0000000000 | 1.0000000000 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 3 |
NA | eyetracking_collection | 3699461 | 16 | 1619598396445 | 0 | zone | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Zone1 | 172.65625 | 772 | 55 | 49 | 0.0000000000 | 0.940316687 | 0.0502283105 | 0.0596833130 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 4 |
NA | eyetracking_collection | 3699461 | 16 | 1619598396445 | 0 | zone | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Right | 774.65625 | 246 | 438 | 320 | 0.5497716895 | 0.299634592 | 0.4000000000 | 0.3897685749 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 5 |
NA | eyetracking_collection | 3699461 | 16 | 1619598396445 | 0 | zone | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Left | 227.65625 | 246 | 438 | 329 | 0.0502283105 | 0.299634592 | 0.4000000000 | 0.4007308161 | F:/data recovery/Uni/UiT/Workshops/ETWS/data/data_exp_50674-v2/uploads/50674-2-3699461-task-etfm-12247303-eyetracking_collection-1-2.xlsx | 1 | 6 |
In a next step, we define image boundaries. In our example, we are
dealing with two images: one to the right and one to the left. We now
use the zone_y
, zone_x
,
zone_height
, and zone_width
columns to
calculate the edges of the images.
top (upper border) = 246 (zone_y)
bottom (lower border) = 246 + 329 (zone_y + zone_height)
left (left border) = 774 (zone_x)
right (right border) = 774 + 438 (zone_x + zone_width)
<- edat %>%
ibs ::select(participant_id, trial, zone_name, zone_x, zone_y,zone_width, zone_height) %>%
dplyr# get rid of superfluous rows
::filter(zone_name == "Right"|zone_name == "Left") %>%
dplyrna.omit() %>%
# define image boundaries
::mutate(top = zone_y,
dplyrbottom = zone_y + zone_width,
left = zone_x,
right = zone_x + zone_width) %>%
# remove superfluous columns
::select(-zone_x, -zone_y, -zone_width, -zone_height) dplyr
participant_id | trial | zone_name | top | bottom | left | right |
---|---|---|---|---|---|---|
3699461 | 1 | Right | 246 | 684 | 774.656250 | 1212.656250 |
3699461 | 1 | Left | 246 | 684 | 227.656250 | 665.656250 |
3699461 | 10 | Right | 246 | 684 | 774.656250 | 1212.656250 |
3699461 | 10 | Left | 246 | 684 | 227.656250 | 665.656250 |
3699461 | 11 | Right | 246 | 684 | 774.671875 | 1212.671875 |
3699461 | 11 | Left | 246 | 684 | 227.671875 | 665.671875 |
3699461 | 12 | Right | 246 | 684 | 774.671875 | 1212.671875 |
3699461 | 12 | Left | 246 | 684 | 227.671875 | 665.671875 |
3699461 | 13 | Right | 246 | 684 | 774.656250 | 1212.656250 |
3699461 | 13 | Left | 246 | 684 | 227.656250 | 665.656250 |
We can also use the normalized values.
If you do this, then it is very crucial, that you use x_pred_normalised and y_pred_normalised and not x_pred and y_pred in your analysis!
In a next step, we define image boundaries. In our example, we are
dealing with two images: one to the right and one to the left. We now
use the zone_y_normalised
, zone_x_normalised
,
zone_height_normalised
, and
zone_width_normalised
columns to calculate the edges of the
images.
top (upper border) = 246 (zone_y_normalised)
bottom (lower border) = 246 + 329 (zone_y_normalised + zone_height_normalised)
left (left border) = 774 (zone_x_normalised)
right (right border) = 774 + 438 (zone_x_normalised + zone_width_normalised)
<- edat %>%
ibs_norm ::select(participant_id, trial, zone_name, zone_x_normalised,
dplyr%>%
zone_y_normalised, zone_width_normalised, zone_height_normalised) # get rid of superfluous rows
::filter(zone_name == "Right"|zone_name == "Left") %>%
dplyrna.omit() %>%
# define image boundaries
::mutate(top = zone_y_normalised,
dplyrbottom = zone_y_normalised + zone_width_normalised,
left = zone_x_normalised,
right = zone_x_normalised + zone_width_normalised) %>%
# remove superfluous columns
::select(-zone_x_normalised, -zone_y_normalised, -zone_width_normalised, -zone_height_normalised) dplyr
participant_id | trial | zone_name | top | bottom | left | right |
---|---|---|---|---|---|---|
3699461 | 1 | Right | 0.299634592 | 0.699634592 | 0.5497716895 | 0.9497716895 |
3699461 | 1 | Left | 0.299634592 | 0.699634592 | 0.0502283105 | 0.4502283105 |
3699461 | 10 | Right | 0.299634592 | 0.699634592 | 0.5497716895 | 0.9497716895 |
3699461 | 10 | Left | 0.299634592 | 0.699634592 | 0.0502283105 | 0.4502283105 |
3699461 | 11 | Right | 0.299634592 | 0.699634592 | 0.5497716895 | 0.9497716895 |
3699461 | 11 | Left | 0.299634592 | 0.699634592 | 0.0502283105 | 0.4502283105 |
3699461 | 12 | Right | 0.299634592 | 0.699634592 | 0.5497716895 | 0.9497716895 |
3699461 | 12 | Left | 0.299634592 | 0.699634592 | 0.0502283105 | 0.4502283105 |
3699461 | 13 | Right | 0.299634592 | 0.699634592 | 0.5497716895 | 0.9497716895 |
3699461 | 13 | Left | 0.299634592 | 0.699634592 | 0.0502283105 | 0.4502283105 |
We now transform the data so that we have the information about the edges in separate columns. Thus, we have four columns for the right and the left image: bottomedge, leftedge, rightedge, and topedge.
<- ibs %>%
ibs ::mutate(position = tolower(zone_name)) %>%
dplyr::gather(edge, coordinate, top:right) %>%
tidyr::mutate(position_edge = paste0(position, "_", edge, "edge")) %>%
dplyr::select(-zone_name, -position, -edge) %>%
dplyr::spread(position_edge, coordinate) %>%
tidyr::group_by(participant_id, trial) %>%
dplyr::summarise(left_bottomedge = left_bottomedge,
dplyrleft_leftedge = left_leftedge,
left_rightedge = left_rightedge,
left_topedge = left_topedge,
right_bottomedge = right_bottomedge,
right_leftedge = right_leftedge,
right_rightedge = right_rightedge,
right_topedge = right_topedge)
participant_id | trial | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge |
---|---|---|---|---|---|---|---|---|---|
3699461 | 1 | 684 | 227.656250 | 665.656250 | 246 | 684 | 774.656250 | 1212.656250 | 246 |
3699461 | 10 | 684 | 227.656250 | 665.656250 | 246 | 684 | 774.656250 | 1212.656250 | 246 |
3699461 | 11 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
3699461 | 12 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
3699461 | 13 | 684 | 227.656250 | 665.656250 | 246 | 684 | 774.656250 | 1212.656250 | 246 |
3699461 | 14 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
3699461 | 15 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
3699461 | 16 | 684 | 227.656250 | 665.656250 | 246 | 684 | 774.656250 | 1212.656250 | 246 |
3699461 | 17 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
3699461 | 18 | 684 | 227.671875 | 665.671875 | 246 | 684 | 774.671875 | 1212.671875 | 246 |
Now that we have processed the data and defined the image boundaries,
we load the master file. The master file contains information about the
experiment, the individual trials, and the computer and browser used by
the participant. We now load master file (in our example this is called
data_exp_50674-v2_task-etfm.csv
) from the folder called
data_exp_50674-v2
which is located in the data
folder.
<- read_csv(masterpath) %>%
mstr # create participant column that matches the participant column in the data
::mutate(participant_id = `Participant Private ID`) dplyr
Event Index | UTC Timestamp | UTC Date | Local Timestamp | Local Timezone | Local Date | Experiment ID | Experiment Version | Tree Node Key | Repeat Key | Schedule ID | Participant Public ID | Participant Private ID | Participant Starting Group | Participant Status | Participant Completion Code | Participant External Session ID | Participant Device Type | Participant Device | Participant OS | Participant Browser | Participant Monitor Size | Participant Viewport Size | Checkpoint | Task Name | Task Version | Spreadsheet | Spreadsheet Name | Spreadsheet Row | Trial Number | Screen Number | Screen Name | Zone Name | Zone Type | Reaction Time | Reaction Onset | Response Type | Response | Attempt | Correct | Incorrect | Dishonest | X Coordinate | Y Coordinate | Timed Out | randomise_blocks | randomise_trials | display | ANSWER | audio_file | picture_left | picture_right | trial number | condition | target_gender | target_item | target_position | competitor | competitor_item | competitor_position | color | participant_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1619598327535 | 28/04/2021 08:25:27 | 1619598327448 | 2 | 28/04/2021 10:25:27 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | NA | NA | BEGIN TASK | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 1 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3699461 |
2 | 1619598392107 | 28/04/2021 08:26:32 | 1619598392021 | 2 | 28/04/2021 10:26:32 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 1 | 1 | 1 | Screen 1 | Zone1 | eye_tracking | NA | NA | calibration succeeded | 0 | NA | 0 | 1 | 0 | NA | NA | NA | NA | NA | Calibration | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3699461 |
3 | 1619598396097 | 28/04/2021 08:26:36 | 1619598396012 | 2 | 28/04/2021 10:26:36 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 1 | 1 | 1 | Screen 1 | Zone1 | eye_tracking | 68528.495 | NA | NA | NA | NA | 0 | 1 | 0 | NA | NA | NA | NA | NA | Calibration | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3699461 |
4 | 1619598396489 | 28/04/2021 08:26:36 | 1619598396412 | 2 | 28/04/2021 10:26:36 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 1 | Screen 2 | Zone1 | fixation | 449.982 | NA | NA | NA | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
5 | 1619598396597 | 28/04/2021 08:26:36 | 1619598396446 | 2 | 28/04/2021 10:26:36 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone2 | content_web_audio | 22.245 | NA | NA | AUDIO PLAY REQUESTED | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
6 | 1619598401674 | 28/04/2021 08:26:41 | 1619598401595 | 2 | 28/04/2021 10:26:41 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone1 | eye_tracking | NA | NA | Left Time | 2209.4900000083726 | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
7 | 1619598401780 | 28/04/2021 08:26:41 | 1619598401598 | 2 | 28/04/2021 10:26:41 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone1 | eye_tracking | NA | NA | Left Percent | 43 | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
8 | 1619598401780 | 28/04/2021 08:26:41 | 1619598401598 | 2 | 28/04/2021 10:26:41 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone1 | eye_tracking | NA | NA | Right Time | 2901.96499999729 | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
9 | 1619598401780 | 28/04/2021 08:26:41 | 1619598401599 | 2 | 28/04/2021 10:26:41 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone1 | eye_tracking | NA | NA | Right Percent | 57 | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
10 | 1619598401780 | 28/04/2021 08:26:41 | 1619598401599 | 2 | 28/04/2021 10:26:41 | 50674 | 2 | task-etfm | NA | 12247303 | BLIND | 3699461 | NA | complete | NA | NA | computer | Desktop or Laptop | Mac OS 10.14.6 | Chrome 89.0.4389.128 | 1440x900 | 1440x821 | NA | Eyetracking_Sample | 3 | Spreadsheet1 | Spreadsheet1 | 16 | 1 | 2 | Screen 1 | Zone1 | eye_tracking | NA | NA | A Time | 1811.7099999799393 | NA | 0 | 1 | 0 | NA | NA | NA | NA | 1 | Eyetracking | train.png | NA | train.png | cupboard.png | 15 | color | m | train | left | m | cupboard | right | black & brown | 3699461 |
The master file contains a lot of information. As retaining columns
with unnecessary information renders it difficult to parse and work with
data, we remove columns that we do not need. We would like to retain the
following columns though: participant_id
,
trial_number
, condition
,
target_gender
, target_position
,
Zone Type
, Response
, and
Correct
.
<- mstr %>%
mstr_redux # select columns you need
::select(participant_id, `trial number`, condition, target_gender,
dplyr`Zone Type`, Response, Correct, target_item) %>%
target_position, # filter unique
unique() %>%
# remove rows containing NA
na.omit() %>%
# filter out superfluous rows
::filter(`Zone Type` == "response_button_image") %>%
dplyr::rename(trial = `trial number`) %>%
dplyr::mutate(trial = as.character(trial)) %>%
dplyr# convert participant_id and trial into factors
::mutate(participant_id = factor(participant_id),
dplyrtrial = factor(trial))
participant_id | trial | condition | target_gender | target_position | Zone Type | Response | Correct | target_item |
---|---|---|---|---|---|---|---|---|
3699461 | 15 | color | m | left | response_button_image | cupboard.png | 0 | train |
3699461 | 2 | same | f | right | response_button_image | carrot.png | 0 | rose |
3699461 | 6 | same | n | right | response_button_image | egg.png | 0 | piano |
3699461 | 13 | color | f | left | response_button_image | computer.png | 0 | hat |
3699461 | 14 | color | f | right | response_button_image | chair.png | 0 | church |
3699461 | 8 | differernt | f | right | response_button_image | banana.png | 1 | banana |
3699461 | 5 | same | n | left | response_button_image | bed.png | 1 | bed |
3699461 | 12 | different | n | right | response_button_image | door.png | 0 | bicycle |
3699461 | 9 | differernt | m | left | response_button_image | cheese.png | 1 | cheese |
3699461 | 1 | same | f | left | response_button_image | cup.png | 0 | fork |
The next step consists in joining (or merging) the data
(edat
) with the information about the image boundaries
(ibs
). Before joining these two data sets, we will clean
the dat in the edat
file by
removing columns we do not need
factorizing participants and trials
removing rows without gaze information
<- edat %>%
edat # remove superfluous columns
::select(-`0`, -filename, -spreadsheet_row, -time_stamp, -screen_index,
dplyr-convergence, -zone_x, -zone_y, -zone_width,
-zone_height, -zone_x_normalised, -zone_y_normalised, -zone_width_normalised,
-zone_height_normalised, -idname,
# WARNING: If you work with normalized values, REPLACE the following
# with their non-normalized counterparts!
-x_pred_normalised, -y_pred_normalised) %>%
# convert participant_id and trial into factors
::mutate(participant_id = factor(participant_id),
dplyrtrial = factor(trial)) %>%
# remove rows without gaze information
::filter(x_pred != 0,
dplyr!= 0) y_pred
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id |
---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 |
3699461 | 83.01000000 | prediction | 797 | 676 | 0.8164998461 | NA | 1 | 13 |
3699461 | 99.08499999 | prediction | 740 | 631 | 0.8164998461 | NA | 1 | 14 |
3699461 | 114.96000001 | prediction | 846 | 603 | 0.8164998461 | NA | 1 | 15 |
3699461 | 131.33000000 | prediction | 747 | 642 | 0.8164998461 | NA | 1 | 16 |
3699461 | 148.24499999 | prediction | 596 | 426 | 0.8162385964 | NA | 1 | 17 |
Now, we can join (or merge) edat
with ibs
(image boundaries).
<- left_join(edat, ibs, by = c("participant_id", "trial")) edatibs
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 83.01000000 | prediction | 797 | 676 | 0.8164998461 | NA | 1 | 13 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 99.08499999 | prediction | 740 | 631 | 0.8164998461 | NA | 1 | 14 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 114.96000001 | prediction | 846 | 603 | 0.8164998461 | NA | 1 | 15 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 131.33000000 | prediction | 747 | 642 | 0.8164998461 | NA | 1 | 16 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 148.24499999 | prediction | 596 | 426 | 0.8162385964 | NA | 1 | 17 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
It is also useful, to remove data points that have low precision. We
thus remove data points with a predicted accuracy lower than .5
(face_conf
should greater than .5).
<- edatibs %>%
edatibs # filter imprecise data points
::filter(face_conf >= .5)
dplyr# inspect
head(edatibs)
## participant_id time_elapsed type x_pred y_pred face_conf zone_name
## 1: 3699461 0.00000000 prediction 618 533 0.8164998461 <NA>
## 2: 3699461 17.32499999 prediction 604 416 0.8164998461 <NA>
## 3: 3699461 33.34500000 prediction 558 477 0.8164998461 <NA>
## 4: 3699461 48.51500000 prediction 791 753 0.8164998461 <NA>
## 5: 3699461 65.08999999 prediction 783 680 0.8164998461 <NA>
## 6: 3699461 83.01000000 prediction 797 676 0.8164998461 <NA>
## trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1: 1 8 684 227.65625 665.65625 246
## 2: 1 9 684 227.65625 665.65625 246
## 3: 1 10 684 227.65625 665.65625 246
## 4: 1 11 684 227.65625 665.65625 246
## 5: 1 12 684 227.65625 665.65625 246
## 6: 1 13 684 227.65625 665.65625 246
## right_bottomedge right_leftedge right_rightedge right_topedge
## 1: 684 774.65625 1212.65625 246
## 2: 684 774.65625 1212.65625 246
## 3: 684 774.65625 1212.65625 246
## 4: 684 774.65625 1212.65625 246
## 5: 684 774.65625 1212.65625 246
## 6: 684 774.65625 1212.65625 246
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 |
Now we combine the collected data (edatibs
=
edat
plus image boundaries ibs
) with the
metadata (the information from the reduced master file
mstr_redux
)
<- dplyr::left_join(edatibs, mstr_redux, by = c("participant_id", "trial"))
dat # inspect
head(dat)
## participant_id time_elapsed type x_pred y_pred face_conf zone_name
## 1: 3699461 0.00000000 prediction 618 533 0.8164998461 <NA>
## 2: 3699461 17.32499999 prediction 604 416 0.8164998461 <NA>
## 3: 3699461 33.34500000 prediction 558 477 0.8164998461 <NA>
## 4: 3699461 48.51500000 prediction 791 753 0.8164998461 <NA>
## 5: 3699461 65.08999999 prediction 783 680 0.8164998461 <NA>
## 6: 3699461 83.01000000 prediction 797 676 0.8164998461 <NA>
## trial id left_bottomedge left_leftedge left_rightedge left_topedge
## 1: 1 8 684 227.65625 665.65625 246
## 2: 1 9 684 227.65625 665.65625 246
## 3: 1 10 684 227.65625 665.65625 246
## 4: 1 11 684 227.65625 665.65625 246
## 5: 1 12 684 227.65625 665.65625 246
## 6: 1 13 684 227.65625 665.65625 246
## right_bottomedge right_leftedge right_rightedge right_topedge condition
## 1: 684 774.65625 1212.65625 246 same
## 2: 684 774.65625 1212.65625 246 same
## 3: 684 774.65625 1212.65625 246 same
## 4: 684 774.65625 1212.65625 246 same
## 5: 684 774.65625 1212.65625 246 same
## 6: 684 774.65625 1212.65625 246 same
## target_gender target_position Zone Type Response Correct
## 1: f left response_button_image cup.png 0
## 2: f left response_button_image cup.png 0
## 3: f left response_button_image cup.png 0
## 4: f left response_button_image cup.png 0
## 5: f left response_button_image cup.png 0
## 6: f left response_button_image cup.png 0
## target_item
## 1: fork
## 2: fork
## 3: fork
## 4: fork
## 5: fork
## 6: fork
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge | condition | target_gender | target_position | Zone Type | Response | Correct | target_item |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 83.01000000 | prediction | 797 | 676 | 0.8164998461 | NA | 1 | 13 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 99.08499999 | prediction | 740 | 631 | 0.8164998461 | NA | 1 | 14 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 114.96000001 | prediction | 846 | 603 | 0.8164998461 | NA | 1 | 15 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 131.33000000 | prediction | 747 | 642 | 0.8164998461 | NA | 1 | 16 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
3699461 | 148.24499999 | prediction | 596 | 426 | 0.8162385964 | NA | 1 | 17 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork |
We now use the edges of the images to determine if gazes were in the AOI
<- dat %>%
dat # determine if participant's gaze was in AOI
::mutate(AOI = ifelse(
dplyr# if target is left image
== "left" &
target_position > left_topedge &
y_pred < left_bottomedge &
y_pred > left_leftedge &
x_pred < left_rightedge, 1,
x_pred # if target is right image
ifelse(target_position == "right" &
> right_topedge &
y_pred < right_bottomedge &
y_pred > right_leftedge &
x_pred < right_rightedge, 1,
x_pred 0)))
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge | condition | target_gender | target_position | Zone Type | Response | Correct | target_item | AOI |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 0 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 0 |
Define time bins (here 100 ms)
<- dat %>%
dat # arrange by participant, trial, and time
::arrange(participant_id, trial, time_elapsed) %>%
dplyr# bin times into .2 time bins
::mutate(TimeBin = itsadug::timeBins(time_elapsed, 100, pos=0)) dplyr
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge | condition | target_gender | target_position | Zone Type | Response | Correct | target_item | AOI | TimeBin |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 | 0 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 | 0 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 1 | 0 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 0 | 0 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | 0 | fork | 0 | 0 |
<- dat %>%
dat # clean condition (spelling error!)
::mutate(condition = dplyr::case_when(condition == "color" ~ "color",
dplyr== "same" ~ "same",
condition == "different" ~ "different",
condition == "differernt" ~ "different",
condition TRUE ~ condition)) %>%
# change correct from 0 vs 1 into correct vs incorrect
::mutate(Correct = ifelse(Correct == 1, "Correct",
dplyrifelse(Correct == 0, "Incorrect", Correct)),
Correct = factor(Correct))
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge | condition | target_gender | target_position | Zone Type | Response | Correct | target_item | AOI | TimeBin |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 618 | 533 | 0.8164998461 | NA | 1 | 8 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | Incorrect | fork | 1 | 0 |
3699461 | 17.32499999 | prediction | 604 | 416 | 0.8164998461 | NA | 1 | 9 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | Incorrect | fork | 1 | 0 |
3699461 | 33.34500000 | prediction | 558 | 477 | 0.8164998461 | NA | 1 | 10 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | Incorrect | fork | 1 | 0 |
3699461 | 48.51500000 | prediction | 791 | 753 | 0.8164998461 | NA | 1 | 11 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | Incorrect | fork | 0 | 0 |
3699461 | 65.08999999 | prediction | 783 | 680 | 0.8164998461 | NA | 1 | 12 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | same | f | left | response_button_image | cup.png | Incorrect | fork | 0 | 0 |
<- dat %>%
dat ::filter(Correct != "Incorrect") dplyr
participant_id | time_elapsed | type | x_pred | y_pred | face_conf | zone_name | trial | id | left_bottomedge | left_leftedge | left_rightedge | left_topedge | right_bottomedge | right_leftedge | right_rightedge | right_topedge | condition | target_gender | target_position | Zone Type | Response | Correct | target_item | AOI | TimeBin |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3699461 | 0.00000000 | prediction | 753 | 467 | 0.8142243928 | NA | 10 | 303 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | different | m | right | response_button_image | tree.png | Correct | tree | 0 | 0 |
3699461 | 17.86500000 | prediction | 771 | 498 | 0.8142243928 | NA | 10 | 304 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | different | m | right | response_button_image | tree.png | Correct | tree | 0 | 0 |
3699461 | 33.91999999 | prediction | 843 | 454 | 0.8142243928 | NA | 10 | 305 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | different | m | right | response_button_image | tree.png | Correct | tree | 1 | 0 |
3699461 | 49.38000001 | prediction | 654 | 536 | 0.8142243928 | NA | 10 | 306 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | different | m | right | response_button_image | tree.png | Correct | tree | 0 | 0 |
3699461 | 65.33499999 | prediction | 619 | 462 | 0.8142243928 | NA | 10 | 307 | 684 | 227.65625 | 665.65625 | 246 | 684 | 774.65625 | 1212.65625 | 246 | different | m | right | response_button_image | tree.png | Correct | tree | 0 | 0 |
#’ Saving the data{-}
You can now save the data in your data
folder, if you
like.
write.table(dat, here::here("data", "dat.txt"), sep = "\t", row.names = F)
To re-load this data, you would have use the following command:
<- read.delim(here::here("data", "dat.txt"), sep = "\t")
reload # inspect
1:4, 1:4] reload[
## participant_id time_elapsed type x_pred
## 1 3699461 0.00000000 prediction 753
## 2 3699461 17.86500000 prediction 771
## 3 3699461 33.91999999 prediction 843
## 4 3699461 49.38000001 prediction 654
Prepare data for a visualization
<- dat %>%
f1 # remove "weird" data points
::filter(x_pred > 0,
dplyr> 0,
y_pred < 4200) %>%
time_elapsed # grouping
::group_by(condition, TimeBin, Correct, target_gender) %>%
dplyr# summarise: calculate proportion of looks in AOI
::summarise(Proportion = mean(AOI))
dplyr# inspect data
head(f1, 10)
## # A tibble: 10 × 5
## # Groups: condition, TimeBin, Correct [4]
## condition TimeBin Correct target_gender Proportion
## <chr> <dbl> <fct> <chr> <dbl>
## 1 color 0 Correct f 0.278
## 2 color 0 Correct m 0.429
## 3 color 0 Correct n 0.545
## 4 color 100 Correct f 0.353
## 5 color 100 Correct m 0.407
## 6 color 100 Correct n 1
## 7 color 200 Correct f 0.353
## 8 color 200 Correct m 0.379
## 9 color 200 Correct n 1
## 10 color 300 Correct f 0.312
condition | TimeBin | Correct | target_gender | Proportion |
---|---|---|---|---|
color | 0 | Correct | f | 0.2777777778 |
color | 0 | Correct | m | 0.4285714286 |
color | 0 | Correct | n | 0.5454545455 |
color | 100 | Correct | f | 0.3529411765 |
color | 100 | Correct | m | 0.4074074074 |
ggplot(f1, aes(y = Proportion, x = TimeBin, color = condition)) +
# lines for proportions
geom_line() +
# add vertical line
geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
# add vertical line
geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
# add text
::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +
ggplot2# separate panels for each target_gender
facet_grid(target_gender ~ .) +
# black and white theme
theme_bw() +
# no grid lines
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "top",
# define x-axis tick labels
axis.text.x = element_text(angle = 45, vjust=0.6, size = 10)) +
# define x-axis
scale_x_continuous(name = "Time in Trial (ms)",
limits = c(0,4000),
breaks = seq(0,4000,1000),
labels = seq(0, 4000, 1000)) +
# define y-axis
scale_y_continuous(name = "Proportion in AOI",
limits = c(0, 1),
breaks = seq(0, 1,.2),
labels = seq(0, 1, .2))
# save plot
ggsave(file = here::here("images","Fig01.png"),
height = 5, width = 10, dpi = 320)
# scatter plot with error bars
ggplot(dat, aes(x=TimeBin, y= AOI, group = condition, color = condition)) +
stat_summary(fun = mean, geom = "line", aes(group= condition, color = condition)) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) +
# add vertical line
geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
# add vertical line
geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
# add text
::annotate(geom = "text", label = "Object", x = 2800, y = .85, color = "gray20", size = 5) +
ggplot2# def. font size
theme_bw(base_size = 15) +
theme(axis.text.x = element_text(size=10, angle = 90),
axis.text.y = element_text(size=10, face="plain"),
legend.position = "top",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
# define x-axis
scale_x_continuous(name = "Time in Trial (ms)",
limits = c(0,4000),
breaks = seq(0,4000,1000),
labels = seq(0, 4000, 1000)) +
# define y-axis
scale_y_continuous(name = "Proportion in AOI",
limits = c(0, 1),
breaks = seq(0, 1,.2),
labels = seq(0, 1, .2))
## Warning: Removed 123 rows containing non-finite values (`stat_summary()`).
## Removed 123 rows containing non-finite values (`stat_summary()`).
# save plot
ggsave(file = here::here("images","Fig02.png"),
height = 5, width = 10, dpi = 320)
## Warning: Removed 123 rows containing non-finite values (`stat_summary()`).
## Removed 123 rows containing non-finite values (`stat_summary()`).
ggplot(f1, aes(y = Proportion, x = TimeBin, color = condition, fill = condition)) +
# lines for proportions
geom_smooth(span = .2, alpha = .2) +
# add vertical line
geom_vline(xintercept = 1900, linetype="dotted", color = "darkgrey", size=.75) +
# add vertical line
geom_vline(xintercept = 3450, linetype="dotted", color = "darkgrey", size=.75) +
# add text
::annotate(geom = "text", label = "Object", x = 2700, y = 1.1, color = "gray20", size = 5) +
ggplot2# separate panels for each condition
facet_grid(~target_gender) +
# black and white theme
theme_bw() +
# no grid lines
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "top",
# define x-axis tick labels
axis.text.x = element_text(angle = 45, vjust=0.6, size = 10)) +
# define x-axis
scale_x_continuous(name = "Time in Trial (ms)",
limits = c(0,4000),
breaks = seq(0,4000,1000),
labels = seq(0, 4000, 1000)) +
# define y-axis
scale_y_continuous(name = "Proportion in AOI",
limits = c(-.3, 1.2),
breaks = seq(-.1, 1,.2),
labels = seq(0, 1, .2))
# save plot
ggsave(file = here::here("images","Fig03.png"),
height = 5, width = 10, dpi = 320)
We use a mixed-effects binomial logistic regression, to check if the conditions affect the proportion of AOI gazes during a period of interest (after the stimulus was shown).
We go over this without much explanation. However, if you want to know more about how mixed-effects model work, what to consider, and how to interpret them, Gries (2021), Winter (2019), or Field, Miles, and Field (2012) are highly recommendable resources! You can also find additional information here or here.
# set options
options(contrasts = c("contr.treatment", "contr.poly"))
#options(contrasts = c("contr.helmert", "contr.poly"))
#options(contrasts = c("contr.sum", "contr.poly"))
<- dat %>%
statzdat ::filter(time_elapsed > 1900 &
dplyr< 3450) time_elapsed
Generate base-line model.
# generate model
<- glmer(AOI ~ (1 | trial) + (1 | target_item),
m0 family = binomial,
data = statzdat,
control=glmerControl(optimizer="bobyqa"))
Generate final model.
# generate model
<- update(m0, .~.+ condition * target_gender) m1
Summarize the final model.
# generate model
summary(m1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: AOI ~ (1 | trial) + (1 | target_item) + condition + target_gender +
## condition:target_gender
## Data: statzdat
## Control: glmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 2620.7 2684.1 -1299.3 2598.7 2342
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.9018982 -0.5924560 -0.2865165 0.7971672 6.6556263
##
## Random effects:
## Groups Name Variance Std.Dev.
## trial (Intercept) 0.5592683 0.7478424
## target_item (Intercept) 0.4510510 0.6716033
## Number of obs: 2353, groups: trial, 14; target_item, 14
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.7558947 0.7769111 -3.54725 0.00038928 ***
## conditiondifferent 2.3636688 1.0573990 2.23536 0.02539364 *
## conditionsame 2.8272882 1.2887571 2.19381 0.02824907 *
## target_genderm 2.4553915 1.0618560 2.31236 0.02075794 *
## target_gendern 3.2118163 1.2778093 2.51353 0.01195285 *
## conditiondifferent:target_genderm -3.0855267 1.4728957 -2.09487 0.03618244 *
## conditionsame:target_genderm -2.1498085 1.6440007 -1.30767 0.19098567
## conditiondifferent:target_gendern -2.4414499 1.7815108 -1.37044 0.17055024
## conditionsame:target_gendern -4.3955408 1.9386026 -2.26738 0.02336728 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtnd cndtns trgt_gndrm trgt_gndrn
## cndtndffrnt -0.734
## conditionsm -0.603 0.442
## targt_gndrm -0.731 0.537 0.441
## targt_gndrn -0.607 0.445 0.366 0.444
## cndtndffrnt:trgt_gndrm 0.527 -0.718 -0.318 -0.721 -0.320
## cndtnsm:trgt_gndrm 0.472 -0.346 -0.784 -0.646 -0.286
## cndtndffrnt:trgt_gndrn 0.434 -0.592 -0.262 -0.318 -0.716
## cndtnsm:trgt_gndrn 0.400 -0.293 -0.664 -0.292 -0.659
## cndtndffrnt:trgt_gndrm cndtnsm:trgt_gndrm
## cndtndffrnt
## conditionsm
## targt_gndrm
## targt_gndrn
## cndtndffrnt:trgt_gndrm
## cndtnsm:trgt_gndrm 0.465
## cndtndffrnt:trgt_gndrn 0.425 0.205
## cndtnsm:trgt_gndrn 0.211 0.521
## cndtndffrnt:trgt_gndrn
## cndtndffrnt
## conditionsm
## targt_gndrm
## targt_gndrn
## cndtndffrnt:trgt_gndrm
## cndtnsm:trgt_gndrm
## cndtndffrnt:trgt_gndrn
## cndtnsm:trgt_gndrn 0.472
Run Post-hoc tests
summary(glht(m1, mcp(condition="Tukey")))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: glmer(formula = AOI ~ (1 | trial) + (1 | target_item) + condition +
## target_gender + condition:target_gender, data = statzdat,
## family = binomial, control = glmerControl(optimizer = "bobyqa"))
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## different - color == 0 2.3636688 1.0573990 2.23536 0.064815 .
## same - color == 0 2.8272882 1.2887571 2.19381 0.071499 .
## same - different == 0 0.4636193 1.2544972 0.36957 0.927075
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
We now tabulate the results of the final model.
# generate summary table
::tab_model(m0, m1) sjPlot
 | AOI | AOI | ||||
---|---|---|---|---|---|---|
Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p |
(Intercept) | 0.56 | 0.26 – 1.18 | 0.125 | 0.06 | 0.01 – 0.29 | <0.001 |
condition [different] | 10.63 | 1.34 – 84.45 | 0.025 | |||
condition [same] | 16.90 | 1.35 – 211.28 | 0.028 | |||
target gender [m] | 11.65 | 1.45 – 93.37 | 0.021 | |||
target gender [n] | 24.82 | 2.03 – 303.77 | 0.012 | |||
condition [different] × target gender [m] |
0.05 | 0.00 – 0.82 | 0.036 | |||
condition [same] × target gender [m] |
0.12 | 0.00 – 2.92 | 0.191 | |||
condition [different] × target gender [n] |
0.09 | 0.00 – 2.86 | 0.171 | |||
condition [same] × target gender [n] |
0.01 | 0.00 – 0.55 | 0.023 | |||
Random Effects | ||||||
σ2 | 3.29 | 3.29 | ||||
τ00 | 1.01 trial | 0.56 trial | ||||
0.98 target_item | 0.45 target_item | |||||
ICC | 0.38 | 0.23 | ||||
N | 14 trial | 14 trial | ||||
14 target_item | 14 target_item | |||||
Observations | 2353 | 2353 | ||||
Marginal R2 / Conditional R2 | 0.000 / 0.377 | 0.158 / 0.356 |
And we visualize the fixed effects.
::plot_model(m1) sjPlot
Once the data is in in a proper format, we can also use the
eyetrackeR
package for our analysis. The advantage of using
the eyetrackeR
package is that is has many in-built
functions that make the analysis of eye-tracking data a lot easier.
Also, there are very helpful and
detailed tutorials on how to perform analyses and visualize
eye-tracking data using eyetrackeR
.
Before we can use the eyetrackeR
package, however, we
need to create certain columns in our data that the
eyetrackeR
package expects.
In our case, we need to create a
column specifying if a gaze was in the AOI (which we will call
OnTarget
)
column specifying if a gaze was not in the AOI (which we will
call OffTarget
).
trackloss_column
(which we will call
Trackloss
). This column contains information about data
point that we want to remove during the analysis. In our case, we will
code data points that have negative x- and y-coordinates as well as data
points that occurred after 4200ms as TRUE (meaning that we consider them
cases of trackloss).
<- dat %>%
dat ::mutate(TrackLoss = dplyr::case_when(x_pred < 0 ~ TRUE,
dplyr< 0 ~ TRUE,
y_pred > 4200 ~ TRUE,
time_elapsed TRUE ~ FALSE)) %>%
::mutate(OnTarget = dplyr::case_when(AOI == 1 ~ 1,
dplyrTRUE ~ 0),
OffTarget = dplyr::case_when(AOI == 1 ~ 0,
TRUE ~ 1))
Now that we have generated the required columns in our data, we can
generate an eyetrackingr_data
and specify the columns that
the eyetraceR
package wants us to specify.
<- make_eyetrackingr_data(dat,
data participant_column = "participant_id",
trial_column = "trial",
time_column = "time_elapsed",
trackloss_column = "TrackLoss",
aoi_columns = c('OnTarget','OffTarget'),
treat_non_aoi_looks_as_missing = TRUE
)# inspect data
head(data)
## # A tibble: 6 × 29
## participant_id time_elapsed type x_pred y_pred face_conf zone_name trial
## <fct> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <fct>
## 1 3699461 0 prediction 753 467 0.814 <NA> 10
## 2 3699461 17.9 prediction 771 498 0.814 <NA> 10
## 3 3699461 33.9 prediction 843 454 0.814 <NA> 10
## 4 3699461 49.4 prediction 654 536 0.814 <NA> 10
## 5 3699461 65.3 prediction 619 462 0.814 <NA> 10
## 6 3699461 85.0 prediction 633 438 0.814 <NA> 10
## # ℹ 21 more variables: id <int>, left_bottomedge <dbl>, left_leftedge <dbl>,
## # left_rightedge <dbl>, left_topedge <dbl>, right_bottomedge <dbl>,
## # right_leftedge <dbl>, right_rightedge <dbl>, right_topedge <dbl>,
## # condition <chr>, target_gender <chr>, target_position <chr>,
## # `Zone Type` <chr>, Response <chr>, Correct <fct>, target_item <chr>,
## # AOI <dbl>, TimeBin <dbl>, TrackLoss <lgl>, OnTarget <lgl>, OffTarget <lgl>
We can also tabulate the number of
*on target* gazes that remain in the data using the
table`
function.
table(data$OnTarget)
##
## FALSE TRUE
## 3277 2644
In a next step, we specify the window that we want to inspect (in our case, we want to check the window starting at 1900 ms and ending at 3450 ms).
# subset to response window post word-onset
<- subset_by_window(data,
response_window window_start_time = 1900,
window_end_time = 3450,
rezero = FALSE)
We now check to see if we need to remove data points.
# analyze amount of trackloss by subjects and trials
<- trackloss_analysis(data = response_window)) (trackloss
## # A tibble: 30 × 6
## participant_id trial Samples TracklossSamples TracklossForTrial
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 3699461 10 82 0 0
## 2 3699461 11 76 0 0
## 3 3699461 16 90 0 0
## 4 3699461 17 93 0 0
## 5 3699461 3 88 0 0
## 6 3699461 4 91 0 0
## 7 3699461 5 92 0 0
## 8 3699461 7 84 0 0
## 9 3699461 8 85 0 0
## 10 3699461 9 62 0 0
## # ℹ 20 more rows
## # ℹ 1 more variable: TracklossForParticipant <dbl>
Remove trackloss (trial_prop_thresh greater than or equal to .25).
# remove trials with > 25% of trackloss
<- clean_by_trackloss(data = response_window,
response_window_clean trial_prop_thresh = .25)
Extract response data.
# aggregate across trials within subjects in time analysis
<- make_time_sequence_data(response_window_clean,
response time_bin_size = 50,
predictor_columns = c("condition", "Correct"),
aois = c("OnTarget", "OffTarget")
)
Visualize response data.
# visualize time results
plot(response,
predictor_column = "condition") +
theme_light() +
coord_cartesian(ylim = c(0,1))
Schweinberger, Martin. 2023. Processing and Analyzing Eye-Tracking Data in R. Workshop at UIT AcqVA Aurora. Tromsø: The Artic University of Norway. url: https://slcladal.github.io/eyews.html (Version 2023.06.02).
@manual{schweinberger2023eyews,
author = {Schweinberger, Martin},
title = {Processing and Analyzing Eye-Tracking Data in R},
note = {https://slcladal.github.io/eyews.html},
year = {2021},
organization = "Arctic University of Norway, AcqVA Aurora Center},
address = {Tromsø},
edition = {2023.06.02}
}
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.28 kableExtra_1.3.4 knitr_1.42 multcomp_1.4-23
## [5] TH.data_1.1-1 MASS_7.3-60 survival_3.5-5 mvtnorm_1.1-3
## [9] sjPlot_2.8.14 lme4_1.1-33 Matrix_1.5-4.1 itsadug_2.4.1
## [13] plotfunctions_1.4 mgcv_1.8-42 nlme_3.1-162 data.table_1.14.4
## [17] eyetrackingR_0.2.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
## [21] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
## [25] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.5 colorspace_2.1-0 deldir_1.0-6
## [4] ellipsis_0.3.2 sjlabelled_1.2.0 rprojroot_2.0.3
## [7] snakecase_0.11.0 htmlTable_2.4.1 estimability_1.4.1
## [10] parameters_0.20.2 base64enc_0.1-3 rstudioapi_0.14
## [13] farver_2.1.1 bit64_4.0.5 fansi_1.0.4
## [16] xml2_1.3.3 codetools_0.2-19 splines_4.2.2
## [19] cachem_1.0.6 sjmisc_2.8.9 Formula_1.2-4
## [22] jsonlite_1.8.4 nloptr_2.0.3 ggeffects_1.1.5
## [25] broom_1.0.3 cluster_2.1.4 png_0.1-8
## [28] effectsize_0.8.3 compiler_4.2.2 httr_1.4.4
## [31] sjstats_0.18.2 emmeans_1.8.4-1 backports_1.4.1
## [34] fastmap_1.1.0 lazyeval_0.2.2 cli_3.6.0
## [37] htmltools_0.5.4 tools_4.2.2 coda_0.19-4
## [40] gtable_0.3.1 glue_1.6.2 Rcpp_1.0.10
## [43] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.6.2
## [46] svglite_2.1.1 insight_0.19.1 xfun_0.39
## [49] rvest_1.0.3 timechange_0.1.1 lifecycle_1.0.3
## [52] zoo_1.8-11 scales_1.2.1 vroom_1.6.1
## [55] ragg_1.2.5 hms_1.1.2 parallel_4.2.2
## [58] sandwich_3.0-2 RColorBrewer_1.1-3 yaml_2.3.7
## [61] gridExtra_2.3 sass_0.4.5 rpart_4.1.19
## [64] latticeExtra_0.6-30 stringi_1.7.12 highr_0.10
## [67] bayestestR_0.13.0 checkmate_2.1.0 boot_1.3-28.1
## [70] rlang_1.1.1 pkgconfig_2.0.3 systemfonts_1.0.4
## [73] evaluate_0.20 lattice_0.21-8 labeling_0.4.2
## [76] htmlwidgets_1.6.1 bit_4.0.5 tidyselect_1.2.0
## [79] here_1.0.1 magrittr_2.0.3 R6_2.5.1
## [82] generics_0.1.3 Hmisc_4.8-0 foreign_0.8-84
## [85] pillar_1.9.0 withr_2.5.0 nnet_7.3-19
## [88] datawizard_0.6.5 performance_0.10.2 modelr_0.1.10
## [91] crayon_1.5.2 interp_1.1-3 utf8_1.2.3
## [94] tzdb_0.3.0 rmarkdown_2.20 jpeg_0.1-10
## [97] grid_4.2.2 readxl_1.4.2 digest_0.6.31
## [100] webshot_0.5.4 xtable_1.8-4 textshaping_0.3.6
## [103] munsell_0.5.0 viridisLite_0.4.1 bslib_0.4.2