[英]R List combinations of items with count of these
I am trying to see how many records I have for combinations of Products.我想看看我有多少条产品组合的记录。 Some Accounts have a couple of different products, some have 3 or 4. I've done a group by, which gives the number of Products attached to each Account:
有些帐户有几个不同的产品,有些有 3 或 4 个。我做了一个分组,它给出了每个帐户附加的产品数量:
test <- data %>%
unique() %>%
group_by(ACCOUNT) %>% summarise(number = n())
What I am trying to do next is group the Product permutations so I have a count for each of a+b, b+c, a+b+c, a+b+m, m+n etc. I don't expect all the possible permutations to exist, but I don't know what the biggest number of products combined is - that's one of the things I'm trying to work out.我接下来要做的是对产品排列进行分组,所以我对 a+b、b+c、a+b+c、a+b+m、m+n 等每个都有一个计数。我不希望存在所有可能的排列,但我不知道组合的最大产品数量是什么——这是我正在努力解决的问题之一。 (although it's probably about 5 or 6)
(虽然可能大约是 5 或 6 个)
Edited to add sample data
| Account | Product |
| -------- | -------------- |
| 1 | a |
| 1 | b |
|1 |c |
|2 |a |
|2 |c |
|3 |a |
|3 |c |
|4 |a |
|4 |b|
Desired Results - each unique combination to be counted separately.期望的结果 - 每个独特的组合单独计算。
| Product combo | Count |
| -------- | -------------- |
| ab | 1 |
| ac | 2 |
|abc |1 |
I use a ;
我用一个
;
separator because it seems nicer, but here is a dplyr
version:分隔符,因为它看起来更好,但这里是
dplyr
版本:
library(dplyr)
df %>%
group_by(Account) %>%
summarize(combo = paste(sort(Product), collapse = ";"), .groups = "drop") %>%
count(combo)
# # A tibble: 3 × 2
# combo Count
# <chr> <int>
# 1 a;b 1
# 2 a;b;c 1
# 3 a;c 2
Using this data:使用这些数据:
df = read.table(text = ' Account Product
1 a
1 b
1 c
2 a
2 c
3 a
3 c
4 a
4 b', header = T)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.