简体   繁体   中英

Computing new variable based on timestamps and group variable with R

I have a dataset containing continuous data of subjects' emotional responses to different stimuli. You can download an example data file here: https://www.dropbox.com/s/gvn27b6can2ka8s/example%20data.txt?dl=0 .

The stimulus variable has 3 different values (NBL, EGR, KGR). I need to split up some of these values (EGR and KGR) into three parts based on information from time, stimulus and subject. What I want to end up with is a new stimulus variable with the values NBL, EGR_1, EGR_2, EGR3, KGR_1, KGR_2, KGR_3.

KGR_1
Begins: First row containing KGR for each value of “subject”
Ends: 304 seconds after first row of KGR

KGR_2
Begins: First row after last row of KGR_1
Ends: 90 seconds after first row of KGR_2

KGR_3
Begins: First row after last row of KGR_2
Ends: Last row containing KGR for each value of “subject"

EGR_1
Begins: First row containing EGR for each value of “subject”
Ends: 304 seconds after first row of EGR

EGR_2
Begins: First row after last row of EGR_1
Ends: 91 seconds after first row of EGR_2

EGR_3
Begins: First row after last row of EGR_2
Ends: Last row containing EGR for each value of “subject"

I approached the problem with this code:

exampledata <- exampledata %>%
  mutate(time = as.POSIXct(strptime(substr(time, 1, 8), "%H:%M:%S"))) %>% 
  group_by(subject) %>% 
  mutate(dt_secs =  as.numeric(difftime(time, lag(time), units = 'secs'))) %>% 
  tidyr::replace_na(list(dt_secs = 0)) %>% 
  group_by(subject, stimulus) %>% 
  mutate(cum_time = cumsum(dt_secs),
     is_first_for_event = cum_time == min(cum_time),
     is_last_for_event = cum_time == max(cum_time),
     KGR_1_end = (stimulus == "KGR") & (cum_time == 304),
     KGR_2_start = (stimulus == "KGR") & (cum_time == 305),
     KGR_2_end = (stimulus == "KGR") & (cum_time == 394),
     KGR_3_start = (stimulus == "KGR") & (cum_time == 395),
     EGR_1_end = (stimulus == "EGR") & (cum_time == 304),
     EGR_2_start = (stimulus == "EGR") & (cum_time == 305),
     EGR_2_end = (stimulus == "EGR") & (cum_time == 395),
     EGR_3_start = (stimulus == "EGR") & (cum_time == 396))

The problem is that because I have 7-8 frames per second of analysis, the dataset has several rows of data per second, so this doesn't quite work. I've tried to adapt this using milliseconds as units instead of seconds, but unsuccessfully. Perhaps this approach is just fine, in this case I don't know how to continue with what I end up with (multiple values of TRUE) trying to compute this new variable.

R and coding in general is really new to me.

I am not completely sure what you try to do, you make all kinds of variables (like is_first_for_event) of which it is unclear how they are related to your question to calculated a new stimulus variable.

I am also unsure whether I understand your question correctly, but basically I interpreted it as you want to include time interval into the stimulus character.

exampledata <- exampledata %>%
  arrange(time) %>% 
  mutate(time = as.POSIXct(strptime(substr(time, 1, 8), "%H:%M:%S"))) %>% 
  group_by(subject) %>% 
  mutate(dt_secs =  as.numeric(difftime(time, lag(time), units = 'secs'))) %>% 
  tidyr::replace_na(list(dt_secs = 0)) %>% 
  group_by(subject, stimulus) %>% 
  mutate(cum_time = cumsum(dt_secs) ,
         new_stimulus_var = case_when(
           stimulus == "KGR" & cum_time < 304 ~ "KGR_1",
           stimulus == "KGR" & cum_time < 394 ~ "KGR_2",
           stimulus == "KGR" & cum_time >= 394 ~ "KGR_3",
           stimulus == "EGR" & cum_time < 304 ~ "EGR_1",
           stimulus == "EGR" & cum_time < 395 ~ "EGR_2",
           stimulus == "EGR" & cum_time >= 395 ~ "EGR_3",
           TRUE ~ NA_character_ 
         )) 

which will result into:

time                Neutral     Happy   Sad  Angry Surprised   Scared Disgusted Contempt stimulus subject dt_secs cum_time new_stimulus_var
   <dttm>                <dbl>     <dbl> <dbl>  <dbl>     <dbl>    <dbl>     <dbl>    <dbl> <chr>    <chr>     <dbl>    <dbl> <chr>           
 1 2020-01-15 00:18:02   0.171 0.0000511 0.885 0.0625  0.000939 0.000170   0.00383 0        EGR      VP24          0        0 EGR_1           
 2 2020-01-15 00:18:02   0.163 0.0000461 0.893 0.0592  0.000864 0.000173   0.00374 0        EGR      VP24          0        0 EGR_1           
 3 2020-01-15 00:18:02   0.176 0.0000422 0.883 0.0633  0.000788 0.000171   0.00374 0.000355 EGR      VP24          0        0 EGR_1           
 4 2020-01-15 00:18:02   0.206 0.0000428 0.862 0.0728  0.000714 0.000212   0.00357 0.000238 EGR      VP24          0        0 EGR_1           
 5 2020-01-15 00:18:02   0.236 0.0000450 0.851 0.0723  0.000612 0.000246   0.00343 0.000488 EGR      VP24          0        0 EGR_1           
 6 2020-01-15 00:18:02   0.236 0.0000435 0.855 0.0674  0.000502 0.000269   0.00320 0.00416  EGR      VP24          0        0 EGR_1           
 7 2020-01-15 00:18:02   0.206 0.0000671 0.875 0.0566  0.000428 0.000283   0.00428 0.00528  EGR      VP24          0        0 EGR_1           
 8 2020-01-15 00:18:03   0.171 0.0000850 0.897 0.0450  0.000405 0.000775   0.00880 0.00463  EGR      VP24          1        1 EGR_1           
 9 2020-01-15 00:18:03   0.137 0.0000848 0.919 0.0350  0.000370 0.000875   0.0104  0.00495  EGR      VP24          0        1 EGR_1           
10 2020-01-15 00:18:03   0.165 0.0000933 0.930 0.0274  0.000410 0.00107    0.00980 0.0141   EGR      VP24          0        1 EGR_1  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM