[英]How do I match a group of observations with a dyad?
Say I have a data frame with a list of names and the companies they have as clients:假设我有一个包含名称列表和他们作为客户的公司的数据框:
name <- c("Anne", "Anne", "Mary", "Mary", "Mary", "Joe", "Joe", "Joe", "David", "David", "David", "David", "David")
company <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "F", "G", "H")
df1 <- data.frame(name, company)
Then I have a second data frame where I have companies who are working together on projects:然后我有第二个数据框,我有公司在项目上合作:
company1 <- c("A", "B", "C", "D", "E", "F", "G", "H")
company2 <- c("B", "C", "E", "E", "G", "A", "B", "C")
df2 <- data.frame(company1, company2)
My preferred outcome would be something like this:我的首选结果是这样的:
name A B C D E F G No of sets
1 Anne 1 1 0 0 0 0 0 1
2 David 0 0 0 1 1 1 1 1
3 Joe 1 1 1 0 0 0 0 2
4 Mary 0 0 1 1 1 0 0 1
So this counts the number of "sets" that match the sets in df2.因此,这会计算与 df2 中的集合匹配的“集合”的数量。 For example, Anne has A and B with 1s, and it matches row 1 in df2.例如,Anne 有 A 和 B 为 1,它匹配 df2 中的第 1 行。 Joe has A, B, C, and both A and B and B and C are rows in df2, thus Joe's row has two matches. Joe 有 A、B、C,并且 A 和 B 以及 B 和 C 都是 df2 中的行,因此 Joe 的行有两个匹配项。
I think this might work for you.我认为这可能对你有用。 Let me know.让我知道。 It doesn't match your expected result because you didn't include H
, which I presumed to be a typo?它与您的预期结果不符,因为您没有包含H
,我认为这是一个错字? Likewise, should Mary's No_of_sets
also equal 2?同样,Mary 的No_of_sets
也应该等于 2?
# Tabulate the frequency of name x company combinations
r <- as.data.frame.matrix(table(df1$name, df1$company))
r
#> A B C D E F G H
#> Anne 1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe 1 1 1 0 0 0 0 0
#> Mary 0 0 1 1 1 0 0 0
# Get "sets" of companies working together
s <- paste(df2$company1, df2$company2)
s
#> [1] "A B" "B C" "C E" "D E" "E G" "F A" "G B" "H C"
# Get all potential company sets associated with each name
m <- apply(r, MARGIN = 1, FUN = function(x) combn(names(which(x==1)), 2))
# Intersect sets of companies potentially working together (m) with
# companies actually working together (df2)
# (You could use a nested apply here, but I thought that it
# would be too opaque. Looping is a little more clear.)
for(name in rownames(r)){
pairs <- m[[name]]
ppairs <- apply(pairs, 2, paste0, collapse = " ")
r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#> A B C D E F G H No_of_sets
#> Anne 1 1 0 0 0 0 0 0 1
#> David 0 0 0 1 1 1 1 1 2
#> Joe 1 1 1 0 0 0 0 0 2
#> Mary 0 0 1 1 1 0 0 0 2
Created on 2021-10-19 by the reprex package (v2.0.1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.