简体   繁体   中英

Creating new columns for consecutive TRUEs in R

I want to create new columns that puts TRUE if the number of consecutive wins are two, three etc. So I would like row 3, 6, 7, 8 to be TRUE in a new column called "twoconswins" and row 7, 8 to be true in a new column called "threeconswins" and so on. What is the best way for doing this?

>         id        date team teamscore opponent opponentscore home   win
>9         9 2005-10-05  DET         5      STL             1    1  TRUE
>38       38 2005-10-09  DET         6      CAL             3    1  TRUE
>48       48 2005-10-10  DET         2      VAN             4    1 FALSE
>88       88 2005-10-17  DET         3      SJS             2    1  TRUE
>110     110 2005-10-21  DET         3      ANA             2    1  TRUE
>148     148 2005-10-27  DET         5      CHI             2    1  TRUE
>179     179 2005-11-01  DET         4      CHI             1    1  TRUE
>194     194 2005-11-03  DET         3      EDM             4    1 FALSE
>212     212 2005-11-05  DET         1      PHO             4    1 FALSE

I assumed row 1 should be the header, so that actually rows 2, 5, 6 and 7 should evaluate to TRUE for "twoconswins", and row 6 and 7 for "threeconswins".

You could do:

library(data.table)
df$twoconswins <-  (df$win & shift(df$win, 1, NA)) == TRUE
df$threeconswins <- (df$win & shift(df$win, 1, NA) & shift(df$win, 2, NA)) == TRUE

I am thinking this could be more vectorized though, especially if 50 consecutive wins could be possible as well and you'd like to create columns for that as well.

If you like to automatically make the new columns as well, in case it happens sometimes 500 consecutive wins occur, you could do this:

df <- read.table(text =
                      'id   date     team teamscore opponent opponentscore home   win
             9         9 2005-10-05  DET         5      STL             1    1  TRUE
             38       38 2005-10-09  DET         6      CAL             3    1  TRUE
             48       48 2005-10-10  DET         2      VAN             4    1  FALSE
             88       88 2005-10-17  DET         3      SJS             2    1  TRUE
             110     110 2005-10-21  DET         3      ANA             2    1  TRUE
             148     148 2005-10-27  DET         5      CHI             2    1  TRUE
             179     179 2005-11-01  DET         4      CHI             1    1  TRUE
             194     194 2005-11-03  DET         3      EDM             4    1 FALSE
             212     212 2005-11-05  DET         1      PHO             4    1 FALSE',
 header = TRUE)



rles <- data.frame(values = c(rle(df$win)$values), 
                   lengths = c(rle(df$win)$lengths))

maxconwins <-  max(rles[rles$values == TRUE,]) 

for(x in 1: maxconwins){
  x <- seq(1,x)
  partialstring <- paste("shift(df$win,", x, ",NA)", collapse = " & ")
  fullstring <- paste0("df$nr", max(x), "conswins <-  (", partialstring, ") == TRUE")
  eval(parse(text = fullstring))
}

df[1:maxconwins,9:12][upper.tri(df[1:maxconwins,9:12], diag = TRUE)] <- NA

   > df[,8:12]
      win nr1conswins nr2conswins nr3conswins nr4conswins
9    TRUE          NA          NA          NA          NA
38   TRUE        TRUE          NA          NA          NA
48  FALSE        TRUE        TRUE          NA          NA
88   TRUE       FALSE       FALSE       FALSE          NA
110  TRUE        TRUE       FALSE       FALSE       FALSE
148  TRUE        TRUE        TRUE       FALSE       FALSE
179  TRUE        TRUE        TRUE        TRUE       FALSE
194 FALSE        TRUE        TRUE        TRUE        TRUE
212 FALSE       FALSE       FALSE       FALSE       FALSE

BTW, I only added the last line because (FALSE & TRUE & TRUE & NA) == TRUE evaluates to FALSE, while you probably like these cells to be NA. I just made sure of this here by setting the upper triagonal of the symmetric submatrix to NA afterwards. For readibility I manually added the column numbers 9 and 12 in here, but you could specify those with a function as well if you'd like.

UPDATE: When using the Reduce() function as suggested by Frank, you could do this for loop instead of the above:

for(x in 1: maxconwins){
 x <- seq(1,x)
 eval(parse(text = paste0("df$nr", max(x), "conswins <- (Reduce(`&`, shift(df$win, 1:", max(x), "))) == TRUE")))
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM