I want to create new columns that puts TRUE if the number of consecutive wins are two, three etc. So I would like row 3, 6, 7, 8 to be TRUE in a new column called "twoconswins" and row 7, 8 to be true in a new column called "threeconswins" and so on. What is the best way for doing this?
> id date team teamscore opponent opponentscore home win
>9 9 2005-10-05 DET 5 STL 1 1 TRUE
>38 38 2005-10-09 DET 6 CAL 3 1 TRUE
>48 48 2005-10-10 DET 2 VAN 4 1 FALSE
>88 88 2005-10-17 DET 3 SJS 2 1 TRUE
>110 110 2005-10-21 DET 3 ANA 2 1 TRUE
>148 148 2005-10-27 DET 5 CHI 2 1 TRUE
>179 179 2005-11-01 DET 4 CHI 1 1 TRUE
>194 194 2005-11-03 DET 3 EDM 4 1 FALSE
>212 212 2005-11-05 DET 1 PHO 4 1 FALSE
I assumed row 1 should be the header, so that actually rows 2, 5, 6 and 7 should evaluate to TRUE for "twoconswins", and row 6 and 7 for "threeconswins".
You could do:
library(data.table)
df$twoconswins <- (df$win & shift(df$win, 1, NA)) == TRUE
df$threeconswins <- (df$win & shift(df$win, 1, NA) & shift(df$win, 2, NA)) == TRUE
I am thinking this could be more vectorized though, especially if 50 consecutive wins could be possible as well and you'd like to create columns for that as well.
If you like to automatically make the new columns as well, in case it happens sometimes 500 consecutive wins occur, you could do this:
df <- read.table(text =
'id date team teamscore opponent opponentscore home win
9 9 2005-10-05 DET 5 STL 1 1 TRUE
38 38 2005-10-09 DET 6 CAL 3 1 TRUE
48 48 2005-10-10 DET 2 VAN 4 1 FALSE
88 88 2005-10-17 DET 3 SJS 2 1 TRUE
110 110 2005-10-21 DET 3 ANA 2 1 TRUE
148 148 2005-10-27 DET 5 CHI 2 1 TRUE
179 179 2005-11-01 DET 4 CHI 1 1 TRUE
194 194 2005-11-03 DET 3 EDM 4 1 FALSE
212 212 2005-11-05 DET 1 PHO 4 1 FALSE',
header = TRUE)
rles <- data.frame(values = c(rle(df$win)$values),
lengths = c(rle(df$win)$lengths))
maxconwins <- max(rles[rles$values == TRUE,])
for(x in 1: maxconwins){
x <- seq(1,x)
partialstring <- paste("shift(df$win,", x, ",NA)", collapse = " & ")
fullstring <- paste0("df$nr", max(x), "conswins <- (", partialstring, ") == TRUE")
eval(parse(text = fullstring))
}
df[1:maxconwins,9:12][upper.tri(df[1:maxconwins,9:12], diag = TRUE)] <- NA
> df[,8:12]
win nr1conswins nr2conswins nr3conswins nr4conswins
9 TRUE NA NA NA NA
38 TRUE TRUE NA NA NA
48 FALSE TRUE TRUE NA NA
88 TRUE FALSE FALSE FALSE NA
110 TRUE TRUE FALSE FALSE FALSE
148 TRUE TRUE TRUE FALSE FALSE
179 TRUE TRUE TRUE TRUE FALSE
194 FALSE TRUE TRUE TRUE TRUE
212 FALSE FALSE FALSE FALSE FALSE
BTW, I only added the last line because (FALSE & TRUE & TRUE & NA) == TRUE evaluates to FALSE, while you probably like these cells to be NA. I just made sure of this here by setting the upper triagonal of the symmetric submatrix to NA afterwards. For readibility I manually added the column numbers 9 and 12 in here, but you could specify those with a function as well if you'd like.
UPDATE: When using the Reduce() function as suggested by Frank, you could do this for loop instead of the above:
for(x in 1: maxconwins){
x <- seq(1,x)
eval(parse(text = paste0("df$nr", max(x), "conswins <- (Reduce(`&`, shift(df$win, 1:", max(x), "))) == TRUE")))
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.