简体   繁体   English

如何计算数据框中每种组合的比例?

[英]How to calculate the proportion of each combination in a data frame?

Lets say we have a data.frame as following: 假设我们有一个data.frame,如下所示:

  A B C
1 1 1 1
2 1 0 1
3 1 0 1
4 0 1 0
5 0 0 1

As Output I want something containing this: 作为输出,我想要包含此内容的东西:

ABC = 0.2 ABC = 0.2

AC = 0.4 交流电= 0.4

B = 0.2 B = 0.2

C = 0..2 C = 0..2

but for a much larger data.frame. 但是对于更大的data.frame。 Does anyone know an elegant code to do so?. 有人知道这样做的优雅代码吗? If so please let me know, thank you. 如果是这样,请告诉我,谢谢。

If M is your matrix you can do 如果M是您的矩阵,则可以

table(apply(M, 1, function(v) paste0(names(v[v==1]), collapse = ""))) / nrow(M)

With your example: 以您的示例为例:

> M <- cbind(A = c(1,1,1,0,0), B = c(1,0,0,1,0), C = c(1,1,1,0,1))
> table(apply(M, 1, function(v) paste0(names(v[v==1]), collapse = ""))) / nrow(M)

ABC  AC   B   C 
0.2 0.4 0.2 0.2 
ind = which(d == 1, arr.ind = TRUE)
table(sapply(split(colnames(d)[ind[,2]], ind[,1]), paste, collapse = "-"))/NROW(d)

#A-B-C   A-C     B     C 
#  0.2   0.4   0.2   0.2

DATA 数据

d = structure(list(A = c(1L, 1L, 1L, 0L, 0L),
                   B = c(1L, 0L, 0L, 1L, 0L),
                   C = c(1L, 1L, 1L, 0L, 1L)),
              class = "data.frame",
              row.names = c("1", "2", "3", "4", "5"))

Using imap from purr we can replace the 1s with the column name and the 0s with empty strings. 使用purr中的imap,我们可以将1替换为列名,将0替换为空字符串。 Then the prop.table of the columns pasted together gives the desired output 然后将粘贴在一起的列的prop.table提供所需的输出

library(purrr)

df %>% 
  imap(~ifelse(.x, .y, '')) %>% 
  do.call(what = paste0) %>% 
  table %>% 
  prop.table
# .
# ABC  AC   B   C 
# 0.2 0.4 0.2 0.2 

If your data is very large it would be faster to change the names of the table at the end instead of creating three new columns of "A", "B" and "C" instead of ones and zeros first. 如果您的数据非常大,那么在末尾更改表名将比创建“ A”,“ B”和“ C”这三个新列而不是先创建一个零开头更快。 Same output as above, purrr not needed. 与上面相同的输出,不需要purrr。

out <- 
  df %>% 
    do.call(what = paste0) %>% 
    table %>% 
    prop.table

names(out) <- sapply(strsplit(names(out), ''), 
                     function(x) paste(LETTERS[which(x == '1')], collapse = ''))
out

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM