简体   繁体   English

如何在R中的两个日期之间进行汇总?

[英]How to aggregate between the two dates in R?

Below are the two tables 以下是两个表

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

I want to display the result in this manner 我想以这种方式显示结果

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

I used the command 我用的命令

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

I want the quantity to be aggregated between the given dates in first table, ie between the first and second dates, second and third dates and so on. 我希望数量在第一张表中的给定日期之间进行汇总,即第一和第二个日期之间,第二和第三个日期之间等等。 Also the last date needs to be aggregated till the end of date in the other table. 同样,最后一个日期也需要汇总到另一个表中的日期为止。

Thanks in advance! 提前致谢!

I see what you mean in this question as compared to your previous one 上一个问题相比,我明白您在这个问题中的意思

This gives you the exact output you provide: 这将为您提供确切的输出:

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13       15

produced by the following code (see the explanation): 由以下代码生成(请参阅说明):

#your data & preps
df1 <- read.table(text=
                    "'Date'                   'OldPrice'   'NewPrice'
                  '2014-06-12 09:32:56'     '0'          '10'
                  '2014-06-27 16:13:36'     '10'         '12'
                  '2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
                    "'Date'                  'Qty'
                  '2014-06-15 18:09:23'     '5'
                  '2014-06-19 12:04:29'     '4'
                  '2014-06-22 13:21:34'     '3'
                  '2014-06-29 19:01:22'     '6'
                  '2014-07-01 18:02:33'     '3'
                  '2014-09-29 22:41:47'     '6'" , stringsAsFactors=F, header=T)

df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
df1 <- df1[with(df1, order(Date)),] #order df1 by Date
values <- vector("list", length = nrow(df1)+1) #declare a list of specific length of df1+1
out_of_time_dates_before <- c(); out_of_time_dates_after <- c() #here will be dates that come before or after dates available in df1
names(values) <- c(1:(length(values)-2), "out_of_time_dates_before", "out_of_time_dates_after")

#producing the main outputs
for(j in 1:nrow(df2)){
  print(paste0("Being processed: ", df2$Date[j]))
  for(i in 1:(nrow(df1)-1)){
    if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
      values[[i]] <- append(values[[i]], df2$Qty[j])
    } 
  } 
  if(df2$Date[j]<min(df1$Date)){
    out_of_time_dates_before <- append(out_of_time_dates_before, df2$Qty[j])
    values[["out_of_time_dates_before"]] <- append(values[["out_of_time_dates_before"]], df2$Qty[j])
  } else if(df2$Date[j] > max(df1$Date)){
    out_of_time_dates_after <- append(out_of_time_dates_after, df2$Qty[j])
    values[["out_of_time_dates_after"]] <- append(values[["out_of_time_dates_after"]], df2$Qty[j])
  }
}

#aggregating the quantity for the date ranges and all that falls before or after the date ranges not available in df1   
df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)
df1$Quantity[1] <- df1$Quantity[1]+sapply(values, sum)["out_of_time_dates_before"]
df1$Quantity[length(df1$Quantity)] <- df1$Quantity[length(df1$Quantity)]+sapply(values, sum)["out_of_time_dates_after"]

I think you got interesting problems to solve just a bit unfortunate way of communicating it to the SO community. 我认为您遇到了一些有趣的问题,无法解决将其传达给SO社区的不幸方式。 For instance, it was not clear what to do with the dates in df2 that come before any available range in df1, hence in the code above, I add these quantities (if there are any) to the very first date. 例如,尚不清楚该如何处理df1中任何可用范围之前的df2中的日期,因此在上面的代码中,我将这些数量(如果有)添加到了第一个日期。 This is more generic and analogous to what you expected from the dates in df2 that come after ranges of dates in df1 (add them to the last date of df1). 这与从df2中的日期范围之后的df2中的日期所期望的(将它们添加到df1的最后一个日期)中得到的期望更加通用和相似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM