简体   繁体   English

根据两列中任意一个中的值分配唯一 ID

[英]Assign unique ID based on values in EITHER of two columns

This is not a duplicate of this question .这不是这个问题的重复。 Please read questions entirely before labeling duplicates.在标记重复项之前,请完整阅读问题。

I have a data.frame like so:我有一个这样的data.frame:

library(tidyverse)

tibble(
  color = c("blue", "blue", "red", "green", "purple"),
  shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)

  color  shape   
  <chr>  <chr>   
1 blue   triangle
2 blue   square  
3 red    circle  
4 green  hexagon 
5 purple hexagon 

I'd like to add a group_id column like this:我想像这样添加一个group_id列:

  color  shape    group_id
  <chr>  <chr>       <dbl>
1 blue   triangle        1
2 blue   square          1
3 red    circle          2
4 green  hexagon         3
5 purple hexagon         3

The difficulty is that I want to group by unique values of color or shape .困难在于我想按colorshape的独特值进行分组。 I suspect the solution might be to use list-columns, but I can't figure out how.我怀疑解决方案可能是使用列表列,但我不知道如何。

We can use duplicated in base R我们可以在base R中使用duplicated

df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))

-output -输出

df1
# A tibble: 5 x 3
#  color  shape    group_id
#  <chr>  <chr>       <int>
#1 blue   triangle        1
#2 blue   square          1
#3 red    circle          2
#4 green  hexagon         3
#5 purple hexagon         3

Or using tidyverse或使用tidyverse

library(dplyr)
library(purrr)
df1 %>%
    mutate(group_id = map(.,  duplicated) %>%
                         reduce(`|`) %>%
                         `!` %>% 
                       cumsum)

data数据

df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM