[英]Assign unique ID based on values in EITHER of two columns
This is not a duplicate of this question .这不是这个问题的重复。 Please read questions entirely before labeling duplicates.
在标记重复项之前,请完整阅读问题。
I have a data.frame like so:我有一个这样的data.frame:
library(tidyverse)
tibble(
color = c("blue", "blue", "red", "green", "purple"),
shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)
color shape
<chr> <chr>
1 blue triangle
2 blue square
3 red circle
4 green hexagon
5 purple hexagon
I'd like to add a group_id
column like this:我想像这样添加一个
group_id
列:
color shape group_id
<chr> <chr> <dbl>
1 blue triangle 1
2 blue square 1
3 red circle 2
4 green hexagon 3
5 purple hexagon 3
The difficulty is that I want to group by unique values of color
or shape
.困难在于我想按
color
或shape
的独特值进行分组。 I suspect the solution might be to use list-columns, but I can't figure out how.我怀疑解决方案可能是使用列表列,但我不知道如何。
We can use duplicated
in base R
我们可以在
base R
中使用duplicated
df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))
-output -输出
df1
# A tibble: 5 x 3
# color shape group_id
# <chr> <chr> <int>
#1 blue triangle 1
#2 blue square 1
#3 red circle 2
#4 green hexagon 3
#5 purple hexagon 3
Or using tidyverse
或使用
tidyverse
library(dplyr)
library(purrr)
df1 %>%
mutate(group_id = map(., duplicated) %>%
reduce(`|`) %>%
`!` %>%
cumsum)
df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.