数据帧R中值组合的计数

Question

I have a dataframe like so: 我有一个像这样的数据框：

    df<-structure(list(id = c("A", "A", "A", "B", "B", "C", "C", "D", 
"D", "E", "E"), expertise = c("r", "python", "julia", "python", 
"r", "python", "julia", "python", "julia", "r", "julia")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("id", 
"expertise"), spec = structure(list(cols = structure(list(id = structure(list(), class = c("collector_character", 
"collector")), expertise = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("id", "expertise")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df
   id expertise
1   A         r
2   A    python
3   A     julia
4   B    python
5   B         r
6   C    python
7   C     julia
8   D    python
9   D     julia
10  E         r
11  E     julia

I can get the overall counts of "expertise" by using: 我可以使用以下方法获得“专业知识”的总数：

library(dplyr)    
df %>% group_by(expertise) %>% mutate (counts_overall= n())

However what I want is the counts for combinations of expertise values. 但是我想要的是专业知识价值组合的计数。 In other words how many "id" had the same combination of two expertise eg "r" and"julia"? 换句话说，有多少“ id”具有两个专业知识（例如“ r”和“ julia”）的相同组合？ Here is a desired output: 这是所需的输出：

df_out<-structure(list(expertise1 = c("r", "r", "python"), expertise2 = c("python", 
"julia", "julia"), count = c(2L, 2L, 3L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L), .Names = c("expertise1", 
"expertise2", "count"), spec = structure(list(cols = structure(list(
    expertise1 = structure(list(), class = c("collector_character", 
    "collector")), expertise2 = structure(list(), class = c("collector_character", 
    "collector")), count = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("expertise1", "expertise2", "count"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df_out
  expertise1 expertise2 count
1          r     python     2
2          r      julia     2
3     python      julia     3

Answer 1

The linked answer from latemail's comment creates a matrix Latemail评论的链接答案将创建一个矩阵

crossprod(table(df) > 0)

  expertise expertise julia python r julia 4 3 2 python 3 4 2 r 2 2 3

while the OP expects a dataframe in long format. OP希望使用长格式的数据帧。

1) cross join 1）交叉加入

Below is a data.table solution which uses the CJ() ( cross join ) function: 以下是使用CJ() （交叉 data.table ）函数的data.table解决方案：

library(data.table)
setDT(df)[, CJ(expertise, expertise)[V1 < V2], by = id][
  , .N, by = .(expertise1 = V1, expertise2 = V2)]

  expertise1 expertise2 N 1: julia python 3 2: julia r 2 3: python r 2

CJ(expertise, expertise)[V1 < V2] is the data.table equivalent for t(combn(df$expertise, 2)) or combinat::combn2(df$expertise) . CJ(expertise, expertise)[V1 < V2]是data.table等效t(combn(df$expertise, 2))或combinat::combn2(df$expertise) 。

2) self-join 2）自我加入

Here is another variant which uses a self-join : 这是另一个使用自联接的变体：

library(data.table)
setDT(df)[df, on = "id", allow = TRUE][
  expertise < i.expertise, .N, by = .(expertise1 = expertise, expertise2 = i.expertise)]

  expertise1 expertise2 N 1: python r 2 2: julia r 2 3: julia python 3

Answer 2

A solution not as efficient as crossprod-table approach but easy to understand: 一种解决方案不如交叉产品表方法有效，但易于理解：

library(tidyr)

df %>% group_by(id) %>%
    summarize(expertise = list(combn(sort(expertise), 2, FUN = paste, collapse = '_'))) %>%
    unnest(expertise) %>%
    group_by(expertise) %>%
    summarize(count = n()) %>%
    separate(expertise, c('expertise1', 'expertise2'), sep = '_')

# # A tibble: 3 x 3
#   expertise1 expertise2 count
#   <chr>      <chr>      <int>
# 1 julia      python         3
# 2 julia      r              2
# 3 python     r              2

数据帧R中值组合的计数

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-08-20 04:55:16

1) cross join 1）交叉加入

2) self-join 2）自我加入

解决方案2
2 2018-08-20 02:13:52

数据帧R中值组合的计数

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-08-20 04:55:16

1) cross join 1）交叉加入

2) self-join 2）自我加入

解决方案2 2 2018-08-20 02:13:52

解决方案1
4 已采纳 2018-08-20 04:55:16

解决方案2
2 2018-08-20 02:13:52