Summary: I am analyzing the time difference between an occured stimuli (A&B) and a possible response of the user.
The dataset has the following structure:
structure(list(User = c("005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844"), Date = c("25.11.2015 13:59",
"03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40", "08.12.2015 19:40",
"22.12.2015 08:52", "22.12.2015 08:50", "22.12.2015 15:42", "22.12.2015 20:46",
"05.01.2016 11:33", "05.01.2016 11:35", "05.01.2016 13:22", "05.01.2016 13:21",
"05.01.2016 13:22", "06.01.2016 09:18", "14.02.2016 22:47", "20.02.2016 21:27",
"01.04.2016 13:52", "24.07.2016 07:03", "04.08.2016 08:25"),
Hour = c(1645L, 1833L, 1928L, 1963L, 1963L, 2288L, 2288L,
2295L, 2300L, 2627L, 2627L, 2629L, 2629L, 2629L, 2649L, 3598L,
3741L, 4717L, 7447L, 7712L), StimuliA = c(1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L), StimuliB = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Responses = c(0L,
1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 0L)), .Names = c("User", "Date", "Hour", "StimuliA",
"StimuliB", "Responses"), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
Additional Information on the data: Every row in the datatable is an event log where a User either perceived a certain Stimuli or performed an action (Response). Hour: The "Hour" since the start of the project, when the event occured.
Goal: The overall goal is to measure the time between an the stimuli and the response. (if there was one) I would like to create a loop which goes through the dataset for every User and if the value of a Stimuli is 1, it checks whether later there is a response of the user and the creates a vector with the values for A and one for B.
Question: Would i do this with a for loop, which goes through every User and checks the perceived Stimuli and if there is the value 1 checks whether the same User ID has the value 1 in the closest Response and then compares the 2 dates?
Subquestions // Things I am struggeling with
Desired result:
Stimuli A c=(11253, 2122, 56969), Stimuli B c=(19512,107)
My own code i produced so far is not very helpful. I was experimenting with for loops and if statements, but also the ifelse function.
I am a newbie with R, but did multiple classes on datacamp, but still I am struggling to apply it to my own work of my master thesis. Thanks for all the help.
Additional Info:
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
SQL syntax should be able to get you your answer and is the conventional method for querying tabular data like these. The Data.Table
package makes this sort of syntax accessible.
#import necessary library
library(data.table)
#instantiate data table
dt<-data.table(dt)
#convert date field to Date type
dt$Date <- as.POSIXct(dt$Date, format="%d.%m.%Y %H:%M")
#create another date field so as not to lose during join
dt$rollDate<-dt$Date
#create table with stimuliA and set key for sorting/joining purposes
stima.dt <- dt[StimuliA==1,.(User,rollDate,Date,Hour,StimuliA)]
setkey(stima.dt,User,rollDate)
#Same for stimuliB
stimb.dt <- dt[StimuliB==1,.(User,rollDate,Date,Hour,StimuliB)]
setkey(stimb.dt,User,rollDate)
#same for responses table
resp.dt <- dt[Responses==1,.(User,rollDate,Date,Hour,Responses)]
setkey(resp.dt,User,rollDate)
#Join stimuli A table to closes responses
stim.a<-resp.dt[stima.dt,roll=-Inf]
#calculate Hour differences
stim.a[,difftime(Date,i.Date,units="min")]
#Join stimuli B table to closes responses
stim.b<-resp.dt[stimb.dt,roll=-Inf]
#calculate Hour differences
stim.b[,difftime(Date,i.Date,units="min")]
Here's how you can do that with dplyr
. First, you need to transform your Date column to a POSIXct object. Then, make sure the Date object is ordered with arrange
. You then add a time difference column using mutate
. You can then filter
for rows where Stimuli A or B is 1 and is followed by a Response equal to 1.
df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>%
arrange(User,Date)%>%
mutate(difftime= difftime(lead(Date),Date, units = "mins") ) %>%
group_by(User)%>%
filter((StimuliA==1 | StimuliB==1) & lead(Responses)==1)
User Date Hour StimuliA StimuliB Responses difftime
<chr> <dttm> <int> <int> <int> <int> <time>
1 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-11-25 13:59:00 1645 1 0 0 11253 mins
2 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-12-07 08:18:00 1928 1 0 0 2122 mins
3 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-12-08 19:40:00 1963 0 1 0 19510 mins
4 005b98f3-5b1b-4d10-bdea-a55d012b2844 2016-01-05 11:35:00 2627 0 1 0 106 mins
5 005b98f3-5b1b-4d10-bdea-a55d012b2844 2016-01-06 09:18:00 2649 1 0 0 56969 mins
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.