繁体   English   中英

检查 dataframe 列值是否存在于 R 的列表中

[英]Check if dataframe column value is present in list in R

我有一个 colors 的大师,如下所示

master <- list("Beige" = c("light brown", "light golden", "skin"),
                      "off-white" = c("off white", "cream", "light cream", "dirty white"),
                      "Metallic" = c("steel","silver"),
                      "Multi-colored" = c("multi color", "mixed colors", "mix", "rainbow"),
                      "Purple" = c("lavender", "grape", "jam", "raisin", "plum", "magenta"),
                      "Red" = c("cranberry", "strawberry", "raspberry", "dark cherry", "cherry","rosered"),
                      "Turquoise" = c("aqua marine", "jade green"),
                      "Yellow" = c("fresh lime")
                     )

这是我拥有的数据框列

df$color <- c('multi color','purple','steel','metallic','off white','raisin','strawberry','magenta','skin','Beige','Jade Green','cream','multi-colored','offwhite','rosered',"light cream")

现在我想检查column中的值是否与list key相同或与list values相同

例如
1)如果 df 列值首先是off white ,它应该查看列表键是Beige,off-white,Metallic...如果它存在,则获取值
2)它还应该查看这些键的所有值,如果其中一个键值是light cream而不应被视为off-white
3)没有区分大小写的事项,例如OffWhITe == offwhite或空格事项,例如off white==offwhite

OUTPUT
这应该是预期的 output

df$output <- c("Multi-colored","Purple","Metallic","Metallic","off-white","Purple","Red","Purple","Beige","Beige","Turquoise","off-white","Multi-colored","off-white","Red","off-white")

编辑
c("multi color", "mixed colors", "mix", "rainbow","multicolored","MultI-cOlored","multi-colored","MultiColORed","Multi-colored")应该被认为是Multi-colored

也许我们可以在将list stack到单个string_dist_join之后执行 string_dist_join

library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master, value = 'color') %>%
      unnest(c(color)) %>% 
      type.convert(as.is = TRUE) %>% 
      stringdist_right_join(df %>%
             mutate(rn = row_number()), max_dist = 3) %>% 
      transmute(color = color.y, output = coalesce(name, color.y))
# A tibble: 19 x 2
#   color         output       
#   <chr>         <chr>        
# 1 multi color   Multi-colored
# 2 purple        purple       
# 3 steel         Metallic     
# 4 metallic      metallic     
# 5 off white     off-white    
# 6 raisin        Purple       
# 7 strawberry    Red          
# 8 strawberry    Red          
# 9 magenta       Purple       
#10 skin          Beige        
#11 skin          Multi-colored
#12 Beige         Beige        
#13 Jade Green    Turquoise    
#14 cream         off-white    
#15 cream         Purple       
#16 multi-colored Multi-colored
#17 offwhite      off-white    
#18 rosered       Red          
#19 light cream   off-white    

数据

df <- structure(list(color = c("multi color", "purple", "steel", "metallic", 
"off white", "raisin", "strawberry", "magenta", "skin", "Beige", 
"Jade Green", "cream", "multi-colored", "offwhite", "rosered", 
"light cream")), class = "data.frame", row.names = c(NA, -16L
))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM