简体   繁体   English

如何将一组观察值与二元组匹配?

[英]How do I match a group of observations with a dyad?

Say I have a data frame with a list of names and the companies they have as clients:假设我有一个包含名称列表和他们作为客户的公司的数据框:

name <- c("Anne", "Anne", "Mary", "Mary", "Mary", "Joe", "Joe", "Joe", "David", "David", "David", "David", "David")
company <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "F", "G", "H")

df1 <- data.frame(name, company)

Then I have a second data frame where I have companies who are working together on projects:然后我有第二个数据框,我有公司在项目上合作:

company1 <- c("A", "B", "C", "D", "E", "F", "G", "H")
company2 <- c("B", "C", "E", "E", "G", "A", "B", "C")

df2 <- data.frame(company1, company2)

My preferred outcome would be something like this:我的首选结果是这样的:

  name      A     B     C     D     E     F     G     No of sets
1 Anne      1     1     0     0     0     0     0     1
2 David     0     0     0     1     1     1     1     1
3 Joe       1     1     1     0     0     0     0     2
4 Mary      0     0     1     1     1     0     0     1

So this counts the number of "sets" that match the sets in df2.因此,这会计算与 df2 中的集合匹配的“集合”的数量。 For example, Anne has A and B with 1s, and it matches row 1 in df2.例如,Anne 有 A 和 B 为 1,它匹配 df2 中的第 1 行。 Joe has A, B, C, and both A and B and B and C are rows in df2, thus Joe's row has two matches. Joe 有 A、B、C,并且 A 和 B 以及 B 和 C 都是 df2 中的行,因此 Joe 的行有两个匹配项。

I think this might work for you.我认为这可能对你有用。 Let me know.让我知道。 It doesn't match your expected result because you didn't include H , which I presumed to be a typo?它与您的预期结果不符,因为您没有包含H ,我认为这是一个错字? Likewise, should Mary's No_of_sets also equal 2?同样,Mary 的No_of_sets也应该等于 2?

# Tabulate the frequency of name x company combinations
r <- as.data.frame.matrix(table(df1$name, df1$company))
r
#>       A B C D E F G H
#> Anne  1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe   1 1 1 0 0 0 0 0
#> Mary  0 0 1 1 1 0 0 0

# Get "sets" of companies working together
s <- paste(df2$company1, df2$company2)
s
#> [1] "A B" "B C" "C E" "D E" "E G" "F A" "G B" "H C"

# Get all potential company sets associated with each name
m <- apply(r, MARGIN = 1, FUN = function(x) combn(names(which(x==1)), 2))

# Intersect sets of companies potentially working together (m) with
# companies actually working together (df2)
# (You could use a nested apply here, but I thought that it
# would be too opaque. Looping is a little more clear.)
for(name in rownames(r)){
  pairs <- m[[name]]
  ppairs <- apply(pairs, 2, paste0, collapse = " ")
  r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#>       A B C D E F G H No_of_sets
#> Anne  1 1 0 0 0 0 0 0          1
#> David 0 0 0 1 1 1 1 1          2
#> Joe   1 1 1 0 0 0 0 0          2
#> Mary  0 0 1 1 1 0 0 0          2

Created on 2021-10-19 by the reprex package (v2.0.1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建二元对变量? - How do I create a dyad pair variable? 如何使用 dplyr 将组中的两个观察结果组合成一个新观察结果 - How do I combine two observations in a group into a new observation with dplyr 如何使用 R 获得一组观察值的变异系数? - How do I get the coefficient of variation of a group of observations with R? 如何基于观察组的另一个变量为观察组创建一个新变量 - How do I create a new variable for a group of observations based on another variable specific to that group 如何将 dyad 中一个合作伙伴的值分配给 R 中的另一个合作伙伴(例如使用 dplyr)? - How do I assign the value from one partner in dyad to the other partner in R (e.g. using dplyr)? 按组添加行并完成对 - Add rows and complete dyad by group 如何将因子得分(Exp)应用于数据中的观测值? 如何将数据集中的观测值与基于R中的向量的值匹配? - How do I apply factor scores (Exp) to observations in my data? How do I match observations in my dataset to a value based on vector in R? 当其中一个观察满足特定条件时,如何删除组中的所有行? - How do I drop all the rows within a group when one of the observations meets a certain condition? 如何将组均值与单个观察值进行比较并创建新的 TRUE/FALSE 列? - How do I compare group means to individual observations and make a new TRUE/FALSE column? 我如何获得过滤观察的索引? - How do I get the index of filtered observations?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM