如何在 R 中找到相似度？

Question

I have a data set as I've shown below:我有一个数据集，如下所示：

It shows which book is sold by which shop.它显示了哪家商店出售了哪本书。

df <- tribble(
 ~shop,  ~book_id,  
  "A",       1,      
  "B",       1,      
  "C",       2,      
  "D",       3,      
  "E",       3,      
  "A",       3,      
  "B",       4,      
  "C",       5,      
  "D",       1,      
)

In the data set,在数据集中，

shop A sells 1, 3 A店卖1、3
shop B sells 1, 4 B店卖1、4
shop C sells 2, 5店铺C卖 2, 5
shop D sells 3, 1 D店卖3, 1
shop E sells only 3 E店只卖3件

So now, I want to calculate the Jaccard index here.所以现在，我想在这里计算 Jaccard 指数。 For instance, let's take shop A and shop B .例如，让我们以shop A和shop B为例。 There are three different books that are sold by A and B (book 1, book 3, book 4). A 和 B 出售三本不同的书（书 1、书 3、书 4）。 However, only one product is sold by both shops (this is product 1).但是，两家商店只销售一种产品（这是产品 1）。 So, the Jaccard index here should be 33.3% (1/3) .所以，这里的Jaccard 指数应该是33.3% (1/3) 。

Here is the sample of the desired data:这是所需数据的示例：

df <- tribble(
  ~shop_1, ~shop_2, ~similarity,  
    "A",    "B",         33.3,  
    "B",    "A",         33.33,
    "A",    "C",          0,
    "C",    "A",          0,
    "A",    "D",         100,
    "D",    "A",         100,
    "A",    "E",          50,
    "E",    "A",          50,

)

Any comments/assistance really appreciated.非常感谢任何评论/帮助。 Thanks in advance.提前致谢。

Answer 1

I don't know about a package but you can write your own function.我不知道 package 但您可以编写自己的 function。 I guess by similarity you mean something like this:我猜你所说的相似性是这样的：

similarity <- function(x, y) {
  k <- length(intersect(x, y))
  n <- length(union(x, y))
  k / n
}

Then you can use tidyr::crossing to merge the same data frame with itself然后您可以使用tidyr::crossing将相同的数据框与自身合并

dfg <- df %>% group_by(shop) %>% summarise(books = list(book_id))
crossing(dfg %>% set_names(paste0, "_A"), dfg %>% set_names(paste0, "_B")) %>% 
  filter(shop_A != shop_B) %>% 
  mutate(similarity = map2_dbl(books_A, books_B, similarity))

如何在 R 中找到相似度？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-02 10:24:09

如何在 R 中找到相似度？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-02 10:24:09

解决方案1
1 已采纳 2020-06-02 10:24:09