[英]R: compare two groups of vectors
I have made two recommendation systems and would like to compere the products they recommend and to see how many products are mutual . 我已经制作了两个推荐系统,并希望主持他们推荐的产品,并了解有多少产品是相互的 。 I joined the two results into data frame - one recommendation system columns starts with "z", other one with "b".
我将两个结果加入到数据框中 - 一个推荐系统列以“z”开头,另一个以“b”开头。
Example data: 示例数据:
df <- data.frame(z1 = c("a", "s", "d"), z2 = c("z", "x", "c"), z3 = c("q", "w", "e"),
b1 = c("w", "a", "e"), b2 = c("a", "i", "r"), b3 = c("z", "w", "y"))
ID z1 z2 z3 b1 b2 b3
1 a z q q a z
2 s x w a i r
3 d c e r e y
Desired results: 期望的结果:
ID z1 z2 z3 b1 b2 b3 mutual_recommendation
1 a z q q a z 3
2 s x w a i r 0
3 d c e e r y 1
The problem is that the order might not be the same and compering all the combinations is by Case or ifelse would be a lot of combination, specially when number of Top-N recommendation will change to 10. 问题是订单可能不一样,并且所有组合的合并都是Case或ifelse会有很多组合,特别是当Top-N推荐的数量变为10时。
We can use an apply
to loop over the rows of the subset of dataset (removed the 'ID' column), get the length
of intersect
of the first 3 and next 3 elements 我们可以使用
apply
循环遍历数据集子集的行(删除'ID'列),获取前3个和后3个元素的intersect
length
df$mutual_recommendation <- apply(df[-1], 1, FUN = function(x)
length(intersect(x[1:3], x[4:6])))
df$mutual_recommendation
#[1] 3 0 1
Here is another solution (note: I changed the data.frame
code to produce the data frame that is actually shown under it in the question - they do not match): 这是另一个解决方案(注意:我更改了
data.frame
代码以生成在问题中实际显示在其下的数据框 - 它们不匹配):
> library(dplyr)
> df %>% mutate(mutual_recommendation=apply(df,1,function(x) sum(x[1:3] %in% x[4:6]) ))
z1 z2 z3 b1 b2 b3 mutual_recommendation
1 a z q q a z 3
2 s x w a i r 0
3 d c e r e y 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.