简体   繁体   English

在R中没有循环的情况下匹配多个条件

[英]Match with multiple criteria without loop in R

I have a data frame displaying a set of conditions, for example: 我有一个显示一组条件的数据框,例如:

B = data.frame(col1 = 1:10, col2 = 11:20 )

eg, the first row says that when col1 = 1, col2 = 11. I also have another data frame with the numbers that should met these conditions, for example: 例如,第一行表示当col1 = 1时,col2 = 11.我还有另一个数据框,其中包含应满足这些条件的数字,例如:

A = data.frame(col1 = c(1:11,1:11), col2 = c(11:21,11:21), col3 = 101:122)

I would like to return the sum of the values in col3 in matrix A for all rows that meat the conditions in B. For example, using the first row in B this value is: 对于B中条件的所有行,我想返回矩阵Acol3中值的总和。例如,使用B中的第一行,该值为:

sum(A$col3[which(A$col1 == B$col1[1] & A$col2 == B$col2[1])])
#[1] 213

that is the sum of the entries in col3 in the 1st and 12th row of A . 这是A的第1行和第12行中col3中条目的总和。 I need to find a vector with all these sums for all rows of matrix A . 我需要为矩阵A所有行找到一个包含所有这些和的向量。 I know how to do this with a loop, however in my data matrices A and B are very large and have many conditions, so I was wondering whether there is a way to do the same thing without the loop. 我知道如何用循环来做这个,但是在我的数据矩阵AB非常大并且有很多条件,所以我想知道是否有办法在没有循环的情况下做同样的事情。 Thank you. 谢谢。

Solution in base R 基础R的解决方案

# Sum identical rows
A.summed <- aggregate(col3 ~ col1 + col2, data = A, sum);

# Select col1 col2 combinations that are also present in B 
A.summed.sub <- subset(A.summed, paste(col1, col2) %in% paste(B$col1, B$col2));
#   col1 col2 col3
#1     1   11  213
#2     2   12  215
#3     3   13  217
#4     4   14  219
#5     5   15  221
#6     6   16  223
#7     7   17  225
#8     8   18  227
#9     9   19  229
#10   10   20  231

Or the same as a one-liner 或者像单线一样

A.summed.sub <- subset(aggregate(col3 ~ col1 + col2, data = A, sum), paste(col1, col2) %in% paste(B$col1, B$col2));

# Add summed col3 to dataframe B by matching col1 col2 combinations
B$col3 <- A.summed[match(paste(B$col1, B$col2), paste(A.summed$col1, A.summed$col2)), "col3"];
B;
#   col1 col2 col3
#1     1   11  213
#2     2   12  215
#3     3   13  217
#4     4   14  219
#5     5   15  221
#6     6   16  223
#7     7   17  225
#8     8   18  227
#9     9   19  229
#10   10   20  231

A solution using dplyr . 使用dplyr的解决方案。 A2 is the final output. A2是最终输出。 The idea is grouping the value in col1 and col2 and calculate the sum for col3 . 这个想法是将值分组为col1col2并计算col3的总和。 semi_join is to filter the data frame by matching values based on col1 and col2 in B . semi_join是通过匹配B col1col2值来过滤数据帧。

library(dplyr)

A2 <- A %>%
  group_by(col1, col2) %>%
  summarise(col3 = sum(col3)) %>%
  semi_join(B, by = c("col1", "col2")) %>%
  ungroup()
A2
# # A tibble: 10 x 3
#     col1  col2  col3
#    <int> <int> <int>
#  1     1    11   213
#  2     2    12   215
#  3     3    13   217
#  4     4    14   219
#  5     5    15   221
#  6     6    16   223
#  7     7    17   225
#  8     8    18   227
#  9     9    19   229
# 10    10    20   231

We can do a join on using data.table 我们可以做一个连接on使用data.table

library(data.table(
setDT(A)[B, .(col3 = sum(col3)), on = .(col1, col2), by = .EACHI]
#    col1 col2 col3
# 1:    1   11  213
# 2:    2   12  215
# 3:    3   13  217
# 4:    4   14  219
# 5:    5   15  221
# 6:    6   16  223
# 7:    7   17  225
# 8:    8   18  227
# 9:    9   19  229
#10:   10   20  231

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在多个条件上使用匹配以在 R 中生成值 - Using match on multiple criteria to generate value in R R:在两个数据帧(例如vlookup)中匹配值,但对于不带Key [大数据]的多个条件 - R: Match values in two data frames like vlookup but for multiple criteria without Key [large data] 替换 r 中的嵌套循环(具有多个条件) - replacing nested loop in r (with multiple criteria) 如何在R中使用多个条件和日期时间编写for循环 - How to write a for loop in R with multiple criteria and datetimes R-循环内:根据匹配条件,在下面的下一行设置变量 - R - Within Loop: On match criteria set variable next line below 只返回 r 中一次匹配多个条件的行数 - return only the count of rows that match multiple criteria at once in r 如何在 R 中使用具有多个条件的 match 和 unique 函数? - How to use the match and unique functions with multiple criteria in R? R:根据自定义距离功能和多个条件快速匹配记录 - R: quickly match records based on custom distance function and multiple criteria 通过条件在另一个data.frame中选择行,而R中没有for循环 - Select rows by criteria in another data.frame without for loop in R 使用循环函数根据R中的多个条件删除数据框中的行 - Removing rows in a data frame based on multiple criteria in R with loop function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM