简体   繁体   English

将组数据与用户数据匹配并获取组

[英]Match Group Data with user data and get groups

I have two dataset, one which is various products like this 我有两个数据集,一个是这样的各种产品

User Product
A .   1
A .   2
A .   3
B .   1
B .   3
B .   4

And another table 还有另一张桌子

Group Product
X1 .   1
X1 .   2
X1 .   4
X2 .   1
X2 .   3

My requirement is if all product in a group are present for a user then user belong to the group and would look like this 我的要求是,如果某个用户组中存在所有产品,则该用户属于该组,并且看起来像这样

User X1 X2
A .   1  0
B .   0 .1

I have tried manually doing with loops, tried to match with customize functions but my actual data size is quite large and solutions are not perfect. 我尝试过手动处理循环,尝试与自定义函数匹配,但是我的实际数据量很大,解决方案也不完美。

Need help on this. 需要帮助。

You can accomplish this with some tidy code. 您可以使用一些简洁的代码来完成此操作。

First, some dot-less data (I took the dots to be not necessary, correct me if I'm wrong): 首先,一些无点数据(我把点不必要了,如果我错了,请纠正我):

x1 <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
User Product
A    1
A    2
A    3
B    1
B    3
B    4')
x2 <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Group Product
X1    1
X1    2
X1    4
X2    1
X2    3')
out <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
User X1 X2
A    1  0
B    0  1')

The needed packages: 所需的软件包:

library(dplyr)
library(tidyr)
library(purrr)

x1n <- group_by(x1, User) %>% nest(.key = "x1prod")
x2n <- group_by(x2, Group) %>% nest(.key = "x2prod")

crossing(User = x1n$User, Group = x2n$Group) %>%
  left_join(x1n, by = "User") %>%
  left_join(x2n, by = "Group") %>%
  mutate(allx = map2_lgl(x1prod, x2prod, ~ all(.y$Product %in% .x$Product)))
# # A tibble: 4 x 5
#   User  Group x1prod           x2prod           allx 
#   <chr> <chr> <list>           <list>           <lgl>
# 1 A     X1    <tibble [3 x 1]> <tibble [3 x 1]> FALSE
# 2 A     X2    <tibble [3 x 1]> <tibble [2 x 1]> TRUE 
# 3 B     X1    <tibble [3 x 1]> <tibble [3 x 1]> FALSE
# 4 B     X2    <tibble [3 x 1]> <tibble [2 x 1]> TRUE 

This is of course not your desired result, but I show that output to demonstrate what nesting is doing and that we are row-wise comparing x1prod (single column, Product ) and x2prod (same). 这当然不是您想要的结果,但是我将显示输出以演示嵌套的作用,并且我们正在按行比较x1prod (单列Product )和x2prod (相同)。 From here, simply removing columns and spreading is sufficient: 从这里开始,只需删除列并扩展即可:

crossing(User = x1n$User, Group = x2n$Group) %>%
  left_join(x1n, by = "User") %>%
  left_join(x2n, by = "Group") %>%
  mutate(allx = map2_lgl(x1prod, x2prod, ~ all(.y$Product %in% .x$Product))) %>%
  select(-x1prod, -x2prod) %>%
  spread(Group, allx)
# # A tibble: 2 x 3
#   User  X1    X2   
#   <chr> <lgl> <lgl>
# 1 A     FALSE TRUE 
# 2 B     FALSE TRUE 

(I'm also assuming your desired output is slightly mistaken, as A does not have "4" from group X1 .) (我还假设您期望的输出有误,因为A在组X1中没有“ 4”。)

Another answer that uses dplyr only and a loop would be: 仅使用dplyr并循环的另一个答案是:

library(dplyr)
myFunction = function(df1, df2, user, group, product){
  user = deparse(substitute(user))
  product = deparse(substitute(product))
  group = deparse(substitute(group))
  answer = data.frame(User = as.character(df1[1, user]))
  for(i in unique(df2[,group])){
    temp = df1 %>% summarise(!!i := if_else(all(df2[which(df2[,group] == i),][,product] %in% unique(df1[[product]])), 1, 0))
    answer = cbind(answer, temp[,i])
  }
  return(answer)
}

df1 %>% group_by(User) %>% do(myFunction(., df2, User, Group, Product))
df1

# A tibble: 2 x 3
# Groups:   User [2]
  User     X1    X2
  <chr> <dbl> <dbl>
1 1         0     1
2 2         0     1

Here's a solution using only dplyr and tidyr - 这是仅使用dplyrtidyr的解决方案-

library(dplyr)
library(tidyr)

user_product <- data.frame(User = rep(LETTERS[1:2], each = 3), Product = c(1:3, 1, 3, 4))
group_product <- data.frame(Group = c("x1", "x1", "x1", "x2", "x2"), Product = c(1,2,4,1,3))

left_join(user_product, group_product, by = "Product") %>%
  left_join(group_product, by = "Group") %>%
  group_by(User, Group) %>%
  summarize(
    test = all(Product.y %in% Product.x)
  ) %>%
  spread(Group, test)

# A tibble: 2 x 3
# Groups:   User [2]
  User  x1    x2   
  <fct> <lgl> <lgl>
1 A     FALSE TRUE 
2 B     FALSE TRUE

Somewhat similar to what's already shared by @r2evans but much less verbose, easier to understand, and one less package dependency. 有点类似于@ r2evans已共享的内容,但是冗长得多,更易于理解,并且更少依赖于程序包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM