I have a dataframe
df <- data.frame(structure(list(col1= c("A", "B", "C", "D", "A"),
col2= c(1, 1, 1, 1, 5), col3 = c(2L, 1L, 1L, 1L, 1L)),
.Names = c("col1", "col2", "col3"),
row.names = c(NA, -5L), class = "data.frame"))
I want to add additional column, col4 with values based on col2. Rows that have the same value in col2 will have the same value in col4 as well.
With a work around, I generated a result in the following way.
x <- df[!duplicated(df$col2),]
x$col4 <- paste("newValue", seq(1:nrow(x)), sep="_")
df_new <- merge(x, df, by ="col2")
df_new <- df_new[,c("col2","col4", "col1.y", "col3.y")]
This works but I thought there is a better way doing this. Thank you!
You could try dense_rank()
from dplyr
:
library(dplyr)
df %>%
mutate(col4 = dense_rank(col2),
col4_new = paste0("newValue_", col4))
This gives something very similar to your desired output in your question, but I'm not sure exactly what you're looking for. If you want to ensure that all rows with identical values in col2
get the same value in col4
then just arrange
the df
and then use dense_rank
:
df %>%
arrange(col2) %>%
mutate(col4 = dense_rank(col2),
col4_new = paste0("newValue_", col4))
This should work for a data.frame
of arbitrary size.
May be this helps
df$col4 <- paste0("newValue_", cumsum(!duplicated(df$col2)))
df$col4
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"
Or we use match
with(df, paste0("newValue_", match(col2, unique(col2))))
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"
Or it can be done with factor
with(df, paste0("newValue_", as.integer(factor(col2, levels = unique(col2)))))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.