I have the following data:
Cust <- c(1,1,1,1,1,2,2,2,2,3)
Date <- c("2017-07-10","2017-07-10","2017-07-10","2017-07-10","2017-07-11","2017-07-15","2017-07-15","2017-07-15","2017-06-19","2017-07-19")
TCode <- c(123,123,125,125,124,231,231,234,236,332)
H <- c("A","B","C","D","E","FF",'G',"H","J","GG")
df <- data.frame(Cust,Date,TCode,H)
Now, I have to make a new column "Newcol(df$NewCol)' in such a way that if Cust[1] == Cust[2]
and TCode[1]==TCode[2]
, then the value in df$new_col[2]=df$new_col[1]
will be same as the previous one, else add 1 to it. This will change when the Cust Number change and again start with 1.
NOTE: For every new value in df$Cust
, the first occurrence in df$new_col
will always be 1 The total number of rows is greater than 1M, so need it to be dynamic.
The output needed is as follows:
Using base R
df$New_col <- with(df, ave(TCode, Cust, FUN = function(x) match(x, unique(x))))
Using data.table
library(data.table)
setDT(df)
df[, New_col := rleid(TCode), by = Cust]
Using dplyr
with rleid
from data.table
df %>%
group_by(Cust) %>%
mutate(New_col = rleid(TCode))
Gives us:
Cust Date TCode H New_col
1: 1 2017-07-10 123 A 1
2: 1 2017-07-10 123 B 1
3: 1 2017-07-10 125 C 2
4: 1 2017-07-10 125 D 2
5: 1 2017-07-11 124 E 3
6: 2 2017-07-15 231 FF 1
7: 2 2017-07-15 231 G 1
8: 2 2017-07-15 234 H 2
9: 2 2017-06-19 236 J 3
10: 3 2017-07-19 332 GG 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.