如何将一组观察值与二元组匹配？

Question

Say I have a data frame with a list of names and the companies they have as clients:假设我有一个包含名称列表和他们作为客户的公司的数据框：

name <- c("Anne", "Anne", "Mary", "Mary", "Mary", "Joe", "Joe", "Joe", "David", "David", "David", "David", "David")
company <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "F", "G", "H")

df1 <- data.frame(name, company)

Then I have a second data frame where I have companies who are working together on projects:然后我有第二个数据框，我有公司在项目上合作：

company1 <- c("A", "B", "C", "D", "E", "F", "G", "H")
company2 <- c("B", "C", "E", "E", "G", "A", "B", "C")

df2 <- data.frame(company1, company2)

My preferred outcome would be something like this:我的首选结果是这样的：

  name      A     B     C     D     E     F     G     No of sets
1 Anne      1     1     0     0     0     0     0     1
2 David     0     0     0     1     1     1     1     1
3 Joe       1     1     1     0     0     0     0     2
4 Mary      0     0     1     1     1     0     0     1

So this counts the number of "sets" that match the sets in df2.因此，这会计算与 df2 中的集合匹配的“集合”的数量。 For example, Anne has A and B with 1s, and it matches row 1 in df2.例如，Anne 有 A 和 B 为 1，它匹配 df2 中的第 1 行。 Joe has A, B, C, and both A and B and B and C are rows in df2, thus Joe's row has two matches. Joe 有 A、B、C，并且 A 和 B 以及 B 和 C 都是 df2 中的行，因此 Joe 的行有两个匹配项。

Answer 1

I think this might work for you.我认为这可能对你有用。 Let me know.让我知道。 It doesn't match your expected result because you didn't include H , which I presumed to be a typo?它与您的预期结果不符，因为您没有包含H ，我认为这是一个错字？ Likewise, should Mary's No_of_sets also equal 2?同样，Mary 的No_of_sets也应该等于 2？

# Tabulate the frequency of name x company combinations
r <- as.data.frame.matrix(table(df1$name, df1$company))
r
#>       A B C D E F G H
#> Anne  1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe   1 1 1 0 0 0 0 0
#> Mary  0 0 1 1 1 0 0 0

# Get "sets" of companies working together
s <- paste(df2$company1, df2$company2)
s
#> [1] "A B" "B C" "C E" "D E" "E G" "F A" "G B" "H C"

# Get all potential company sets associated with each name
m <- apply(r, MARGIN = 1, FUN = function(x) combn(names(which(x==1)), 2))

# Intersect sets of companies potentially working together (m) with
# companies actually working together (df2)
# (You could use a nested apply here, but I thought that it
# would be too opaque. Looping is a little more clear.)
for(name in rownames(r)){
  pairs <- m[[name]]
  ppairs <- apply(pairs, 2, paste0, collapse = " ")
  r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#>       A B C D E F G H No_of_sets
#> Anne  1 1 0 0 0 0 0 0          1
#> David 0 0 0 1 1 1 1 1          2
#> Joe   1 1 1 0 0 0 0 0          2
#> Mary  0 0 1 1 1 0 0 0          2

Created on 2021-10-19 by the reprex package (v2.0.1)

如何将一组观察值与二元组匹配？

问题描述

1 个解决方案

解决方案1
0 2021-10-19 18:43:05

如何将一组观察值与二元组匹配？

问题描述

1 个解决方案

解决方案1 0 2021-10-19 18:43:05

解决方案1
0 2021-10-19 18:43:05