R：根据子集数据帧的总和来子集数据帧条件

Question

我有一个数据框操作问题。

我想找到数据框“ data1”的子集，其每个col的总和等于另一个数据框“ data2”。

这是我的代码：

AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)

A1<-c(5)
B1<-c(18)

data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)

library(dplyr)
subset(data1, ((sum(AA) ==data2$A1 )  &&  (sum(BB) ==data2$B1 ) ) )

我想知道是否有其他算法会有所帮助？

谢谢！

Answer 1

此解决方案仅考虑您要从任何两行计算总和的方案。 如果要测试其他行号，则需要通过更改combn函数中的数字来创建这些组合。 final_data是最终输出。 如果存在多个匹配项，则可能需要将final_data保留为列表。

# Prepare example datasets
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)

A1<-c(5)
B1<-c(18)

data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)

# Load packages
library(tidyverse)

# Use combn to find out all the combination of row number
row_indices <- as.data.frame(t(combn(1:nrow(data1), 2)))

# Prepare a list of data frame. Each data frame is one row from row_indices
row_list <- row_indices %>%
  rowid_to_column() %>%
  split(f = .$rowid)

# Based on row_list to subset data1
sub_list <- map(row_list, function(dt){
  temp_data <- data1 %>% filter(row_number() %in% c(dt$V1, dt$V2))
  return(temp_data)
})

# Calcualte the sum of each data frame in sub_list
sub_list2 <- map(sub_list, function(dt){
  dt2 <- dt %>% 
    summarise_all(funs(sum(.))) %>%
    setNames(c("A1", "B1"))
  return(dt2)
})

# Compare each data frame in sub_list2 with data2
# Find the one that is the same and store the logical results in result_indices
result_indices <- map_lgl(sub_list2, function(dt) setequal(dt, data2))

# Get the final output
final_data <- sub_list[result_indices][[1]]

final_data
  AA BB
1  2  5
2  3 13

R：根据子集数据帧的总和来子集数据帧条件

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-08-25 17:10:34

R：根据子集数据帧的总和来子集数据帧条件

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-08-25 17:10:34

解决方案1
2 已采纳 2017-08-25 17:10:34