简体   繁体   English

R:根据子集数据帧的总和来子集数据帧条件

[英]R: subset a data frame condition on sum of the subset data frame

I have a data frame manipulation question. 我有一个数据框操作问题。

I would like to find the subset of data frame "data1" which sum of each col equal to another data frame "data2". 我想找到数据框“ data1”的子集,其每个col的总和等于另一个数据框“ data2”。

Here is my code: 这是我的代码:

AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)

A1<-c(5)
B1<-c(18)

data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)

library(dplyr)
subset(data1, ((sum(AA) ==data2$A1 )  &&  (sum(BB) ==data2$B1 ) ) )

I am wondering if any other algorithm would help? 我想知道是否有其他算法会有所帮助?

Thanks! 谢谢!

This solution only considers the scenario that you want to calculate the sum from any two rows. 此解决方案仅考虑您要从任何两行计算总和的方案。 If you want to test other row numbers, you will need to create those combinations by changing the numbers in the combn function. 如果要测试其他行号,则需要通过更改combn函数中的数字来创建这些组合。 final_data is the final output. final_data是最终输出。 If there are multiple matches, you may want to keep the final_data as a list. 如果存在多个匹配项,则可能需要将final_data保留为列表。

# Prepare example datasets
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)

A1<-c(5)
B1<-c(18)

data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)

# Load packages
library(tidyverse)

# Use combn to find out all the combination of row number
row_indices <- as.data.frame(t(combn(1:nrow(data1), 2)))

# Prepare a list of data frame. Each data frame is one row from row_indices
row_list <- row_indices %>%
  rowid_to_column() %>%
  split(f = .$rowid)

# Based on row_list to subset data1
sub_list <- map(row_list, function(dt){
  temp_data <- data1 %>% filter(row_number() %in% c(dt$V1, dt$V2))
  return(temp_data)
})

# Calcualte the sum of each data frame in sub_list
sub_list2 <- map(sub_list, function(dt){
  dt2 <- dt %>% 
    summarise_all(funs(sum(.))) %>%
    setNames(c("A1", "B1"))
  return(dt2)
})

# Compare each data frame in sub_list2 with data2
# Find the one that is the same and store the logical results in result_indices
result_indices <- map_lgl(sub_list2, function(dt) setequal(dt, data2))

# Get the final output
final_data <- sub_list[result_indices][[1]]

final_data
  AA BB
1  2  5
2  3 13

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM