簡體   English   中英

基於開始日期和結束日期的數組匯總數據框中的值-R

[英]Aggregate values in data frame based on an array of start and end dates - R

示例數據:

    Date_End   <- c("1999-08-30","1999-09-07","1999-09-20","1999-09-27","1999-10-04","1999-10-12")
    Date_Start <- c("1999-08-24" ,"1999-08-30" ,"1999-09-13" ,"1999-09-20" ,"1999-09-27" ,"1999-10-04") 
    as.Date(Date_Start, "%Y-%m-%d" )
    as.Date(Date_End, "%Y-%m-%d" )
    df1 <- data.frame(Date_Start,Date_End)  
    c1 <- data.frame(seq(as.Date('1999-08-24'), as.Date('1999-10-12'), by = 1))
    c2 <- sample(100, size = nrow(c1), replace = TRUE)
    df2 <- data.frame(c2,c1)
    names(df2) <- c("unit","date")
    df2 <- zoo(df2)

我在df1有一個開始和結束日期數組,在df2有一個時間序列。 我想使用聚合函數(主要是求和),以便在df1每一行中獲得df2unit總和。 例如,產生如下內容:

Date_Start  Date_End    sum(unit)
8/24/1999   8/30/1999   282
8/30/1999   9/7/1999    269
9/13/1999   9/20/1999   464
9/20/1999   9/27/1999   308
9/27/1999   10/4/1999   408
10/4/1999   10/12/1999  353

我試過同時使用兩個窗口函數:

window(df2,start = df1$Date_Start, end = df1$Date_End)

並創建一個序列,然后進行索引:

seq_a <- seq(as.Date(df1$Date_Start), as.Date(df1$Date_End), by = 1) test <- df2[seq_a] sum(test)

但是,使用seq時,您只能有一個起點和終點:

Error in seq.Date(as.Date(df1$Date_Start), as.Date(df1$Date_End), by = 1) : 
  'from' must be of length 1

幫助贊賞!

可能應該使用函數而不是循環,但是為了快速又骯臟,您可以執行以下操作:

Date_End   <- c("1999-08-30","1999-09-07","1999-09-20","1999-09-27","1999-10-04","1999-10-12")
Date_Start <- c("1999-08-24" ,"1999-08-30" ,"1999-09-13" ,"1999-09-20" ,"1999-09-27" ,"1999-10-04") 
Date_Start <- as.Date(Date_Start, "%Y-%m-%d" )
Date_End   <- as.Date(Date_End, "%Y-%m-%d" )
df1 <- data.frame(Date_Start,Date_End)
c1 <- data.frame(seq(as.Date('1999-08-24'), as.Date('1999-10-12'), by = 1))
c2 <- sample(100, size = nrow(c1), replace = TRUE)
df2 <- data.frame(c2,c1)
names(df2) <- c("unit","date")

for (i in 1:nrow(df1)) {
  df1$sum[i] <- sum(df2$unit[df2$date > df1$Date_Start[i] & df2$date < df1$Date_End[i]])
}

注意我也修改了代碼的第3行和第4行。

此解決方案不能將df2用作zoo對象,但它可能仍對您有用:

Date_End   <- as.Date(c("1999-08-30","1999-09-07","1999-09-20","1999-09-27","1999-10-04","1999-10-12"))
Date_Start <- as.Date(c("1999-08-24" ,"1999-08-30" ,"1999-09-13" ,"1999-09-20" ,"1999-09-27" ,"1999-10-04")) 
df1 <- data.frame(Date_Start,Date_End)  
c1 <- seq(as.Date('1999-08-24'), as.Date('1999-10-12'), by = 1)
c2 <- sample(100, size = length(c1), replace = TRUE)
df2 <- data.frame(unit = c2, date = c1)

library(sqldf)
> sqldf("select Date_Start, Date_End, sum(unit) as units 
      from df1, 
           df2 
      where df1.Date_Start <= df2.date 
      and df2.date <= df1.Date_end 
      group by Date_Start")
Date_Start   Date_End units
1 1999-08-24 1999-08-30   258
2 1999-08-30 1999-09-07   493
3 1999-09-13 1999-09-20   423
4 1999-09-20 1999-09-27   432
5 1999-09-27 1999-10-04   433
6 1999-10-04 1999-10-12   584

我編輯了一些代碼,包括使Date_StartDate_End日期對象和c1成為矢量,而不是data.frame。

PS不建議使用帶下划線的案例,這是樣式指南

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM