简体   繁体   中英

Create a new data frame column based on the values of two other columns

Let's say I have data frame with two variables and 213005 observations, it looks like that:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", "2012/01/01", "2012/01/01", "2012/01/05"))

I need to create a new column called "new" for each different "nr" value according to "date" value, it should look like this:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", 
                  "2012/01/01", "2012/01/01", "2012/01/05"), 
        new=c(1, 2, 3, 4, 5, 5, 6))

(nr=233, date=2012/01/02) => (new=1)

(nr=233, date=2012/01/01) => (new=2) ...

for (nr=234, date=2012/01/01) there should be two the same columns with new=5, repeated lines should stay in data frame.

Does anyone knows how to do that? Any help would be very appreciated! Thank you!

I'm not entirely sure I understand the logic, but it seems like you want to group by both columns, here's a simple data.table solution using .GRP

library(data.table)
setDT(df)[, new := .GRP, .(nr, date)][]
#     nr       date new
# 1: 233 2012/01/02   1
# 2: 233 2012/01/01   2
# 3: 232 2012/01/01   3
# 4: 231 2012/01/02   4
# 5: 234 2012/01/01   5
# 6: 234 2012/01/01   5
# 7: 205 2012/01/05   6

Using base R ,

 v1 <- do.call(paste, df)
 df$new <- as.numeric(factor(v1, levels=unique(v1)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM