简体   繁体   中英

How to assign a value for a column based on another column value in R?

I have a dataframe

 df <- data.frame(structure(list(col1= c("A", "B", "C", "D", "A"), 
         col2= c(1, 1, 1, 1, 5), col3 = c(2L, 1L, 1L, 1L, 1L)),
         .Names = c("col1", "col2", "col3"), 
         row.names = c(NA, -5L), class = "data.frame"))

I want to add additional column, col4 with values based on col2. Rows that have the same value in col2 will have the same value in col4 as well.

With a work around, I generated a result in the following way.

x <- df[!duplicated(df$col2),]
x$col4 <- paste("newValue", seq(1:nrow(x)), sep="_")

df_new <- merge(x, df, by ="col2")

df_new <- df_new[,c("col2","col4", "col1.y", "col3.y")]

This works but I thought there is a better way doing this. Thank you!

You could try dense_rank() from dplyr :

library(dplyr)
df %>% 
    mutate(col4 = dense_rank(col2),
           col4_new = paste0("newValue_", col4))

This gives something very similar to your desired output in your question, but I'm not sure exactly what you're looking for. If you want to ensure that all rows with identical values in col2 get the same value in col4 then just arrange the df and then use dense_rank :

df %>% 
    arrange(col2) %>% 
    mutate(col4 = dense_rank(col2),
           col4_new = paste0("newValue_", col4))

This should work for a data.frame of arbitrary size.

May be this helps

df$col4 <- paste0("newValue_", cumsum(!duplicated(df$col2)))
df$col4
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"

Or we use match

with(df, paste0("newValue_", match(col2, unique(col2))))
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"

Or it can be done with factor

with(df, paste0("newValue_", as.integer(factor(col2, levels = unique(col2)))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM