Create a new column depending on different values in other columns R

Question

I have a big data set that in its short version looks like this:

> df
Stimulus    TimeDiff
S102        10332.4
S 66        1095.4
S103        2987.8
S 77        551.4
S112        3015.2
S 66        566.6
S114        5999.8
S 88        403.8
S104        4679.4
S 88        655.2

I want to create a new column df$Accuracy where I need to assign correct, incorrect responses, and misses depending on certain values (only S 88, S 66, S 77) in the df$Stimulus and in df$TimeDiff. For example, if S 88 is preceded by S114 or S104 and df$TimeDiff for that row is less than 710 then assign "incorrect" in df$Accuracy. So the data set would look like this:

> df
Stimulus    TimeDiff     Accuracy
S102        10332.4      NA 
S 66        1095.4       NA
S103        2987.8       NA
S 77        551.4        NA
S112        3015.2       NA
S 66        566.6        NA
S114        5999.8       NA
S 88        403.8        incorrect
S104        4679.4       NA
S 88        655.2        incorrect

What is the best way to do it?

Answer 1

You can use ifelse and lag function from dplyr ,

library(dplyr) 
df$Accuracy <- with(df, ifelse(Stimulus %in% c('S88', 'S66', 'S77') &
                                   lag(Stimulus) %in% c('S114', 'S104') & 
                                           TimeDiff < 710, 'incorrect', NA))
df
#   Stimulus TimeDiff  Accuracy
#1      S102  10332.4      <NA>
#2       S66   1095.4      <NA>
#3      S103   2987.8      <NA>
#4       S77    551.4      <NA>
#5      S112   3015.2      <NA>
#6       S66    566.6      <NA>
#7      S114   5999.8      <NA>
#8       S88    403.8 incorrect
#9      S104   4679.4      <NA>
#10      S88    655.2 incorrect

Answer 2

We can use data.table methods for this and it should be efficient as we are assigning ( := ) in place.

library(data.table)
setDT(df)[Stimulus %chin% c("S 88", "S 66", "S 77") & shift(Stimulus) %chin%
          c("S114", "S104") & TimeDiff < 710, Accuracy := "incorrect"]
df
#    Stimulus TimeDiff  Accuracy
# 1:     S102  10332.4        NA
# 2:     S 66   1095.4        NA
# 3:     S103   2987.8        NA
# 4:     S 77    551.4        NA
# 5:     S112   3015.2        NA
# 6:     S 66    566.6        NA
# 7:     S114   5999.8        NA
# 8:     S 88    403.8 incorrect
# 9:     S104   4679.4        NA
#10:     S 88    655.2 incorrect

data

df <- structure(list(Stimulus = c("S102", "S 66", "S103", "S 77", "S112", 
"S 66", "S114", "S 88", "S104", "S 88"), TimeDiff = c(10332.4, 
1095.4, 2987.8, 551.4, 3015.2, 566.6, 5999.8, 403.8, 4679.4, 
655.2)), .Names = c("Stimulus", "TimeDiff"), class = "data.frame", 
row.names = c(NA, -10L))

Create a new column depending on different values in other columns R

Question

2 answers

solution1
1 ACCPTED 2016-08-30 12:06:08

solution2
0 2016-08-30 14:20:12

data

Create a new column depending on different values in other columns R

Question

2 answers

solution1 1 ACCPTED 2016-08-30 12:06:08

solution2 0 2016-08-30 14:20:12

data

solution1
1 ACCPTED 2016-08-30 12:06:08

solution2
0 2016-08-30 14:20:12