[英]Count how many times a vector/row matches data frame
I have a large data frame with "positive" (1) or "negative" (0) data points. 我有一个带有“正”(1)或“负”(0)数据点的大型数据框。
data example 资料范例
my_data <- data.frame(cell = 1:4, marker_a = c(1, 0, 0, 0),
marker_b = c(0,1,1,1), marker_c = c(0,1,1,0), marker_d = c(0,1,0,1))
cell marker_a marker_b marker_c marker_d
1 1 1 0 0 0
2 2 0 1 1 1
3 3 0 1 1 0
4 4 0 1 0 1
...
I have a different data.frame
with all the possible combinations of positive and negative markers any my_data$cell
can have 我有一个不同的
data.frame
其中任何my_data$cell
都可以具有正负标记的所有可能组合
combinations_df <- expand.grid(
marker_a = c(0, 1),
marker_b = c(0, 1),
marker_c = c(0, 1),
marker_d = c(0, 1)
)
marker_a marker_b marker_c marker_d
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
How can I get a data.frame
where each row/combination is matched vs every row of my_data and return the final count for each combination 我如何获得
data.frame
,其中每行/组合与data.frame
每一行都匹配,并返回每个组合的最终计数
Example of expected output: 预期输出示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 14969 15223 15300 14779 14844 16049 15374 15648 15045 15517 15116 15405 14990 15347 14432 15569
I'm guessing the data.table way is fairly efficient: 我猜data.table方式是相当有效的:
library(data.table)
setDT(my_data)
my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ]
marker_a marker_b marker_c marker_d N
1: 0 0 0 0 0
2: 1 0 0 0 1
3: 0 1 0 0 0
4: 1 1 0 0 0
5: 0 0 1 0 0
6: 1 0 1 0 0
7: 0 1 1 0 1
8: 1 1 1 0 0
9: 0 0 0 1 0
10: 1 0 0 1 0
11: 0 1 0 1 1
12: 1 1 0 1 0
13: 0 0 1 1 0
14: 1 0 1 1 0
15: 0 1 1 1 1
16: 1 1 1 1 0
If you only care about combinations that show up in the data, "chain" a filtering command: 如果您只关心数据中显示的组合,请“链接”一个过滤命令:
my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ][ N > 0 ]
marker_a marker_b marker_c marker_d N
1: 1 0 0 0 1
2: 0 1 1 0 1
3: 0 1 0 1 1
4: 0 1 1 1 1
Alternately, in this case you don't even need combinations_df
... 或者,在这种情况下,您甚至不需要
combinations_df
...
my_data[, .N, by = marker_a:marker_d ]
marker_a marker_b marker_c marker_d N
1: 1 0 0 0 1
2: 0 1 1 1 1
3: 0 1 1 0 1
4: 0 1 0 1 1
You are writing your combinations in "binary", so no need of any join, but just little math. 您正在用“ binary”编写组合,因此不需要任何连接,只需要一点数学即可。 Try this:
尝试这个:
setNames(tabulate(as.matrix(my_data[,2:5])%*%2^(0:3)+1,16),1:16)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0
Perhaps you may need 也许您可能需要
setNames(sapply(do.call(paste0, combinations_df ),
function(x) sum(do.call(paste0, my_data[-1])==x)), 1:nrow(combinations_df ))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.