简体   繁体   中英

Loop through dataframe in R and measure time difference between two values

Summary: I am analyzing the time difference between an occured stimuli (A&B) and a possible response of the user.

The dataset has the following structure:

    structure(list(User = c("005b98f3-5b1b-4d10-bdea-a55d012b2844",
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844", "005b98f3-5b1b-4d10-bdea-a55d012b2844", 
"005b98f3-5b1b-4d10-bdea-a55d012b2844"), Date = c("25.11.2015 13:59", 
"03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40", "08.12.2015 19:40", 
"22.12.2015 08:52", "22.12.2015 08:50", "22.12.2015 15:42", "22.12.2015 20:46", 
"05.01.2016 11:33", "05.01.2016 11:35", "05.01.2016 13:22", "05.01.2016 13:21", 
"05.01.2016 13:22", "06.01.2016 09:18", "14.02.2016 22:47", "20.02.2016 21:27", 
"01.04.2016 13:52", "24.07.2016 07:03", "04.08.2016 08:25"), 
    Hour = c(1645L, 1833L, 1928L, 1963L, 1963L, 2288L, 2288L, 
    2295L, 2300L, 2627L, 2627L, 2629L, 2629L, 2629L, 2649L, 3598L, 
    3741L, 4717L, 7447L, 7712L), StimuliA = c(1L, 0L, 1L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L), StimuliB = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Responses = c(0L, 
    1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 
    1L, 1L, 1L, 0L)), .Names = c("User", "Date", "Hour", "StimuliA", 
"StimuliB", "Responses"), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Additional Information on the data: Every row in the datatable is an event log where a User either perceived a certain Stimuli or performed an action (Response). Hour: The "Hour" since the start of the project, when the event occured.

Goal: The overall goal is to measure the time between an the stimuli and the response. (if there was one) I would like to create a loop which goes through the dataset for every User and if the value of a Stimuli is 1, it checks whether later there is a response of the user and the creates a vector with the values for A and one for B.

Question: Would i do this with a for loop, which goes through every User and checks the perceived Stimuli and if there is the value 1 checks whether the same User ID has the value 1 in the closest Response and then compares the 2 dates?

Subquestions // Things I am struggeling with

  1. How do i actually loop through every row and check it for the conditional statement and if TRUE execute a command? (ifelse?).
  2. How would i then as a command save the value of an other cell in this row?
  3. To then tell R to look for the closest Response of the same User ID (chronological) and calculate the time difference between those 2 values?
  4. For finally creating a vector with those calculated values

Desired result:

Stimuli A c=(11253, 2122, 56969), Stimuli B c=(19512,107)

My own code i produced so far is not very helpful. I was experimenting with for loops and if statements, but also the ifelse function.

I am a newbie with R, but did multiple classes on datacamp, but still I am struggling to apply it to my own work of my master thesis. Thanks for all the help.

Additional Info:

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

SQL syntax should be able to get you your answer and is the conventional method for querying tabular data like these. The Data.Table package makes this sort of syntax accessible.

#import necessary library
library(data.table)

#instantiate data table
dt<-data.table(dt)

#convert date field to Date type
dt$Date <- as.POSIXct(dt$Date, format="%d.%m.%Y %H:%M")
#create another date field so as not to lose during join
dt$rollDate<-dt$Date

#create table with stimuliA and set key for sorting/joining purposes
stima.dt <- dt[StimuliA==1,.(User,rollDate,Date,Hour,StimuliA)]
setkey(stima.dt,User,rollDate)

#Same for stimuliB
stimb.dt <- dt[StimuliB==1,.(User,rollDate,Date,Hour,StimuliB)]
setkey(stimb.dt,User,rollDate)

#same for responses table
resp.dt <- dt[Responses==1,.(User,rollDate,Date,Hour,Responses)]
setkey(resp.dt,User,rollDate)

#Join stimuli A table to closes responses
stim.a<-resp.dt[stima.dt,roll=-Inf]

#calculate Hour differences
stim.a[,difftime(Date,i.Date,units="min")]

#Join stimuli B table to closes responses
stim.b<-resp.dt[stimb.dt,roll=-Inf]

#calculate Hour differences
stim.b[,difftime(Date,i.Date,units="min")]

Here's how you can do that with dplyr . First, you need to transform your Date column to a POSIXct object. Then, make sure the Date object is ordered with arrange . You then add a time difference column using mutate . You can then filter for rows where Stimuli A or B is 1 and is followed by a Response equal to 1.

df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>%
  arrange(User,Date)%>%
  mutate(difftime= difftime(lead(Date),Date, units = "mins") ) %>%
  group_by(User)%>%
  filter((StimuliA==1 | StimuliB==1) & lead(Responses)==1)

                                  User                Date  Hour StimuliA StimuliB Responses   difftime
                                 <chr>              <dttm> <int>    <int>    <int>     <int>     <time>
1 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-11-25 13:59:00  1645        1        0         0 11253 mins
2 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-12-07 08:18:00  1928        1        0         0  2122 mins
3 005b98f3-5b1b-4d10-bdea-a55d012b2844 2015-12-08 19:40:00  1963        0        1         0 19510 mins
4 005b98f3-5b1b-4d10-bdea-a55d012b2844 2016-01-05 11:35:00  2627        0        1         0   106 mins
5 005b98f3-5b1b-4d10-bdea-a55d012b2844 2016-01-06 09:18:00  2649        1        0         0 56969 mins

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM