简体   繁体   English

如何在 R 中找到相似度?

[英]How to find the similarity in R?

I have a data set as I've shown below:我有一个数据集,如下所示:

It shows which book is sold by which shop.它显示了哪家商店出售了哪本书。

df <- tribble(
 ~shop,  ~book_id,  
  "A",       1,      
  "B",       1,      
  "C",       2,      
  "D",       3,      
  "E",       3,      
  "A",       3,      
  "B",       4,      
  "C",       5,      
  "D",       1,      
)

In the data set,在数据集中,

  • shop A sells 1, 3 A店卖1、3
  • shop B sells 1, 4 B店卖1、4
  • shop C sells 2, 5店铺C卖 2, 5
  • shop D sells 3, 1 D店卖3, 1
  • shop E sells only 3 E店只卖3件

So now, I want to calculate the Jaccard index here.所以现在,我想在这里计算 Jaccard 指数。 For instance, let's take shop A and shop B .例如,让我们以shop Ashop B为例。 There are three different books that are sold by A and B (book 1, book 3, book 4). A 和 B 出售本不同的书(书 1、书 3、书 4)。 However, only one product is sold by both shops (this is product 1).但是,两家商店只销售一种产品(这是产品 1)。 So, the Jaccard index here should be 33.3% (1/3) .所以,这里的Jaccard 指数应该是33.3% (1/3)

Here is the sample of the desired data:这是所需数据的示例:

df <- tribble(
  ~shop_1, ~shop_2, ~similarity,  
    "A",    "B",         33.3,  
    "B",    "A",         33.33,
    "A",    "C",          0,
    "C",    "A",          0,
    "A",    "D",         100,
    "D",    "A",         100,
    "A",    "E",          50,
    "E",    "A",          50,

)

Any comments/assistance really appreciated.非常感谢任何评论/帮助。 Thanks in advance.提前致谢。

I don't know about a package but you can write your own function.我不知道 package 但您可以编写自己的 function。 I guess by similarity you mean something like this:我猜你所说的相似性是这样的:

similarity <- function(x, y) {
  k <- length(intersect(x, y))
  n <- length(union(x, y))
  k / n
}

Then you can use tidyr::crossing to merge the same data frame with itself然后您可以使用tidyr::crossing将相同的数据框与自身合并

dfg <- df %>% group_by(shop) %>% summarise(books = list(book_id))
crossing(dfg %>% set_names(paste0, "_A"), dfg %>% set_names(paste0, "_B")) %>% 
  filter(shop_A != shop_B) %>% 
  mutate(similarity = map2_dbl(books_A, books_B, similarity))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM