简体   繁体   English

如何在R中的两个日期之间进行汇总?

[英]How to aggregate between two dates in R?

Below are the two tables 以下是两个表

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

I want to display the result in this manner 我想以这种方式显示结果

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

I used the command 我用的命令

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

I want the quantity to be aggregated between the given dates in first table, ie between the first and second dates, second and third dates and so on. 我希望数量在第一张表中的给定日期之间进行汇总,即第一和第二个日期之间,第二和第三个日期之间等等。

Thanks in advance! 提前致谢!

We can do a join with data.table 我们可以使用data.table

library(data.table)
res <- setDT(df1)[df2, roll = -Inf, on = .(Date)][, .(Qty = sum(Qty)),
           .(OldPrice, NewPrice)][df1, on = .(OldPrice, NewPrice)][is.na(Qty), Qty := 0]
setcolorder(res, c(names(df1), "Qty"))
res
#                   Date OldPrice NewPrice Qty
#1: 2014-06-12 09:32:56        0       10   0
#2: 2014-06-27 16:13:36       10       12  12
#3: 2014-08-12 22:41:47       12       13   9

A little bit verbose idea with dplyr and tidyr : dplyrtidyr有点冗长的想法:

library(dplyr)
library(tidyr)

full_join(Table1, Table2, by = "Date") %>% 
  arrange(Date) %>% 
  fill(OldPrice, NewPrice, .direction = "up") %>% 
  group_by(OldPrice, NewPrice) %>% 
  summarize(Qty = sum(Qty, na.rm = TRUE)) %>% 
  ungroup() %>% 
  select(Qty) %>% 
  bind_cols(Table1, .)

#                  Date OldPrice NewPrice Qty
# 1 2014-06-12 09:32:56        0       10   0
# 2 2014-06-27 16:13:36       10       12  12
# 3 2014-08-12 22:41:47       12       13   9

You started with a for loop hence you could do the following the for loops way: 您从for循环开始,因此可以执行以下for循环方式:

df1 <- read.table(text=
"'Date'                   'OldPrice'   'NewPrice'
'2014-06-12 09:32:56'     '0'          '10'
'2014-06-27 16:13:36'     '10'         '12'
'2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
"'Date'                  'Qty'
'2014-06-15 18:09:23'     '5'
'2014-06-19 12:04:29'     '4'
'2014-06-22 13:21:34'     '3'
'2014-06-29 19:01:22'     '6'
'2014-07-01 18:02:33'     '3'" , stringsAsFactors=F, header=T)

df1 <- df1[with(df1, order(Date)),] #order df1 by Date
df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
values <- vector("list", length = nrow(df1)) #declare a list of specific length of df1

for(i in 1:nrow(df1)){
  for(j in 1:nrow(df2)){
  if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
    values[[i]] <- append(values[[i]], df2$Qty[j])
  }
  }
}

df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13        9

Obviously, more work, but it could help out if you were stuck on for loops. 显然,还有更多工作要做,但是如果您陷入循环中,这可能会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM