计算向量/行与数据帧匹配的次数

Question

I have a large data frame with "positive" (1) or "negative" (0) data points. 我有一个带有“正”（1）或“负”（0）数据点的大型数据框。

data example 资料范例

my_data <- data.frame(cell = 1:4, marker_a = c(1, 0, 0, 0), 
  marker_b = c(0,1,1,1), marker_c = c(0,1,1,0), marker_d = c(0,1,0,1))


  cell marker_a marker_b marker_c marker_d
1    1        1        0        0        0
2    2        0        1        1        1
3    3        0        1        1        0
4    4        0        1        0        1
...

I have a different data.frame with all the possible combinations of positive and negative markers any my_data$cell can have 我有一个不同的data.frame其中任何my_data$cell都可以具有正负标记的所有可能组合

combinations_df <- expand.grid(
    marker_a = c(0, 1), 
    marker_b = c(0, 1), 
    marker_c = c(0, 1), 
    marker_d = c(0, 1)
)

   marker_a marker_b marker_c marker_d
1         0        0        0        0
2         1        0        0        0
3         0        1        0        0
4         1        1        0        0
5         0        0        1        0
6         1        0        1        0
7         0        1        1        0
8         1        1        1        0
9         0        0        0        1
10        1        0        0        1
11        0        1        0        1
12        1        1        0        1
13        0        0        1        1
14        1        0        1        1
15        0        1        1        1
16        1        1        1        1

How can I get a data.frame where each row/combination is matched vs every row of my_data and return the final count for each combination 我如何获得data.frame ，其中每行/组合与data.frame每一行都匹配，并返回每个组合的最终计数

Example of expected output: 预期输出示例：

      1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16
1 14969 15223 15300 14779 14844 16049 15374 15648 15045 15517 15116 15405 14990 15347 14432 15569

Answer 1

I'm guessing the data.table way is fairly efficient: 我猜data.table方式是相当有效的：

library(data.table)
setDT(my_data)

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ]


    marker_a marker_b marker_c marker_d N
 1:        0        0        0        0 0
 2:        1        0        0        0 1
 3:        0        1        0        0 0
 4:        1        1        0        0 0
 5:        0        0        1        0 0
 6:        1        0        1        0 0
 7:        0        1        1        0 1
 8:        1        1        1        0 0
 9:        0        0        0        1 0
10:        1        0        0        1 0
11:        0        1        0        1 1
12:        1        1        0        1 0
13:        0        0        1        1 0
14:        1        0        1        1 0
15:        0        1        1        1 1
16:        1        1        1        1 0

If you only care about combinations that show up in the data, "chain" a filtering command: 如果您只关心数据中显示的组合，请“链接”一个过滤命令：

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ][ N > 0 ]


   marker_a marker_b marker_c marker_d N
1:        1        0        0        0 1
2:        0        1        1        0 1
3:        0        1        0        1 1
4:        0        1        1        1 1

Alternately, in this case you don't even need combinations_df ... 或者，在这种情况下，您甚至不需要combinations_df ...

my_data[, .N, by = marker_a:marker_d ]


   marker_a marker_b marker_c marker_d N
1:        1        0        0        0 1
2:        0        1        1        1 1
3:        0        1        1        0 1
4:        0        1        0        1 1

Answer 2

You are writing your combinations in "binary", so no need of any join, but just little math. 您正在用“ binary”编写组合，因此不需要任何连接，只需要一点数学即可。 Try this: 尝试这个：

setNames(tabulate(as.matrix(my_data[,2:5])%*%2^(0:3)+1,16),1:16)
# 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 
# 0  1  0  0  0  0  1  0  0  0  1  0  0  0  1  0

Answer 3

Perhaps you may need 也许您可能需要

setNames(sapply(do.call(paste0, combinations_df ), 
         function(x) sum(do.call(paste0, my_data[-1])==x)), 1:nrow(combinations_df ))

计算向量/行与数据帧匹配的次数

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-10-11 14:44:42

解决方案2
1 2016-10-11 15:36:49

解决方案3
0 2016-10-11 14:31:32

计算向量/行与数据帧匹配的次数

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-10-11 14:44:42

解决方案2 1 2016-10-11 15:36:49

解决方案3 0 2016-10-11 14:31:32

解决方案1
1 已采纳 2016-10-11 14:44:42

解决方案2
1 2016-10-11 15:36:49

解决方案3
0 2016-10-11 14:31:32