简体   繁体   English

R-汇总间隔上的data.frame

[英]R - Summarize data.frame on an interval

I am trying to sum a variable on a data.frame for every Friday. 我试图在每个星期五对data.frame上的变量求和。

Random data frame 随机数据帧

mydf = data.frame(      "ID"   = c( rep( "A" , 6) , rep( "B" , 5 ) ),   "Date" = c( "2017-09-08","2017-09-10","2017-09-13","2017-09-15","2017-09-20","2017-09-22","2017-08-03","2017-08-04","2017-08-10","2017-08-11","2017-08-12" , "Var"  = c( 1,2,3,4,5,6,7,8,NA,10,11) )

mydf$Date = as.Date( mydf$Date )

mydf = cbind( mydf , "WeekDay" = weekdays( mydf$Date ) )

What I want to get 我想得到什么

df_ToGet = 
data.frame( 
    "ID"   = c( rep( "A" , 3) , rep( "B" , 2 ) ),
    "Date" = c( "2017-09-08","2017-09-15","2017-09-22","2017-08-04","2017-08-11"  ),
    "Var_Sum"  = c( 1 , 9 , 11 , 15, 10 )
    )

What I tried 我尝试了什么

I have considered using dplyr::summarize and aggregate but I do not know how to set the by condition properly. 我已经使用其认为dplyr ::归纳汇总 ,但我不知道如何通过条件正确设置。

mydf %>% group_by( ID ) %>% summarize( Var_Sum = aggregate( Var , sum ,  by=list ( (mydf$Weekday)=="Friday") )  )

I have seen a few similar questions being solved using the cut function but that seems to be setting the condition to a standard week? 我已经看到一些使用cut函数解决的类似问题,但这似乎将条件设置为标准周? I'm not too familiar with it yet. 我还不太熟悉。

We need to create a grouping variable using cumsum 我们需要使用cumsum创建分组变量

mydf %>%
    slice(seq_len(tail(which(WeekDay== "Friday"), 1))) %>% 
    group_by(ID, grp = lag(cumsum(WeekDay == "Friday"), default = 0)) %>% 
    summarise(Date = Date[WeekDay == "Friday"], Var = sum(Var, na.rm = TRUE)) %>%
    ungroup() %>%
    select(-grp)
# A tibble: 5 x 3
#     ID       Date   Var
#   <fctr>     <date> <dbl>
#1      A 2017-09-08     1
#2      A 2017-09-15     9
#3      A 2017-09-22    11
#4      B 2017-08-04    15
#5      B 2017-08-11    10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM