简体   繁体   English

如何将一个数据框中的特定行的总和输出到另一个数据框中的新列?

[英]How to output the sum of specific rows from one data frame to a new column in another data frame?

I would ultimately like to have df2 with certain dates and the cumulative sum of values connected to those date ranges from df1.我最终希望 df2 具有某些日期,并且与这些日期范围相关的值的累积总和来自 df1。

df1 = data.frame("date"=c("10/01/2020","10/02/2020","10/03/2020","10/04/2020","10/05/2020",
                          "10/06/2020","10/07/2020","10/08/2020","10/09/2020","10/10/2020"),
                 "value"=c(1:10))
df1
> df1
   date       value
1  10/01/2020     1
2  10/02/2020     2
3  10/03/2020     3
4  10/04/2020     4
5  10/05/2020     5
6  10/06/2020     6
7  10/07/2020     7
8  10/08/2020     8
9  10/09/2020     9
10 10/10/2020    10
df2 = data.frame("date"=c("10/05/2020","10/10/2020"))
df2
> df2
  date
1 10/05/2020
2 10/10/2020

I realize this is incorrect, but I am not sure how to define df2$value as the sums of certain df1$value rows:我意识到这是不正确的,但我不确定如何将 df2$value 定义为某些 df1$value 行的总和:

df2$value = filter(df1, c(sum(1:5),sum(6:10)))
df2

I would like the output to look like this:我希望输出看起来像这样:

> df2
   date       value
1  10/05/2020    15
2  10/10/2020    40

We may use a non-equi join after converting the 'date' columns to Date class在将“日期”列转换为Date类后,我们可能会使用非等值连接

library(lubridate)
library(data.table)
setDT(df1)[, date := mdy(date)]
setDT(df2)[, date := mdy(date)]
df2[, start_date := fcoalesce(shift(date) + days(1), floor_date(date, 'month'))]

df1[df2,.(value = sum(value)), on = .( date >= start_date, 
      date <= date), by = .EACHI][, -1, with = FALSE]
         date value
       <Date> <int>
1: 2020-10-05    15
2: 2020-10-10    40


Or another option is creating a group with findInterval and then do the group by sum或者另一种选择是使用findInterval创建一个组,然后按sum进行分组

library(dplyr)
df1 %>% 
  group_by(grp = findInterval(date, df2$date, left.open = TRUE)) %>% 
  summarise(date = last(date), value = sum(value)) %>% 
  select(-grp)

-output -输出

# A tibble: 2 × 2
  date       value
  <date>     <int>
1 2020-10-05    15
2 2020-10-10    40

Here is another approach using dplyr and lubridate :这是使用dplyrlubridate的另一种方法:

library(lubridate)
library(dplyr)

df1 %>% 
  mutate(date = dmy(date)) %>%
  mutate(date = if_else(date == "2020-05-10" |
                      date == "2020-10-10", date, NA_Date_)) %>% 
  fill(date, .direction = "up") %>% 
  group_by(date) %>% 
  summarise(value = sum(value))
  date       value
  <date>     <int>
1 2020-05-10    15
2 2020-10-10    40

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据来自另一个数据框的特定行的总和改变一列 - Mutate a column based on the sum of specific rows from another data frame 如何使用在r中重复的另一个数据框中的特定列更新数据框中的新列? - How to update new column in data-frame with specific column from another data-frame with duplicated in r? 通过根据另一个数据框中列的值从一个数据框中提取列来创建新数据框 - creating a new data frame by extracting columns from one data frame based on the value of column in another data frame 从一个数据框的不同列创建一个新列,该条件以另一个数据框的另一列为条件 - Create a new column from different columns of one data frame conditioned on another column from another data frame 将两列值从一个数据帧复制到一列,但在另一数据帧中复制两行 - Copy two column values from one data frame to one column but two rows in another data frame 如何对一个数据框中的列值求和并将结果添加为另一个数据框中的列? - How to sum values of column in one data frame and add results as a column in another data frame? 如何在新列中将类别分配给 R 数据框中的特定行? - How to assign category to specific rows in R data frame, in a new column? 基于来自另一个数据框的列创建新的数据框行 - Create new data frame rows based on a column from another data frame R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame 将一个data.frame的值分配给R中另一个data.frame的特定列? - Assign value from one data.frame to a specific column of another data.frame in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM