简体   繁体   English

在 R 中使用我的时间序列的每个月的最后一天

[英]Using the last day in each month of my time series in R

I need to use only the last day available in my dataset to aggregate later on but I didn´t have success...我只需要使用我的数据集中可用的最后一天来稍后聚合,但我没有成功......

library(tibbletime)
      
dataset <- data.frame(
  timestamp = c("2010-01-01", "2010-01-03", "2010-01-23")
  var =       c( 1,             4,            11)
)

monthly_dataset <- as_tbl_time(dataset, index = timestamp) %>%
                   as_period("1 month") 

How can I use some function or R package to aggregate my dataset only for using the last day avaiable ?我如何使用某些函数或 R 包来聚合我的数据集,仅用于使用最后一天可用?

The answer from Julian is a nice start, but it won't work across multiple years because the grouping variable doesn't include information about the year. Julian 的回答是一个不错的开始,但它不会在多年内起作用,因为分组变量不包括有关年份的信息。

The typical way to do this is to group on year-month, and then filter to the max date per year-month group.执行此操作的典型方法是按年月分组,然后过滤到每个年月组的最大日期。

Also, as the creator of tibbletime I would highly suggest that you no longer use it.另外,作为 tibbletime 的创建者,我强烈建议您不再使用它。 It is deprecated and is no longer being supported.它已被弃用,不再受支持。 You should just use clock/lubridate for date handling alongside the tidyverse packages like dplyr, or you should use tsibble if you really need to go all in on time series.您应该只使用 clock/lubridate 与 tidyverse 软件包(如 dplyr)一起进行日期处理,或者如果您真的需要全神贯注于时间序列,则应该使用 tsibble。

library(lubridate)
library(dplyr)

dataset <- tibble(
  timestamp = c(
    "2010-01-01", "2010-01-03", "2010-01-23", 
    "2010-02-01", "2010-02-03", "2011-02-23"
  ),
  var = c(1, 4, 11, 1, 4, 11)
)
dataset <- mutate(dataset, timestamp = ymd(timestamp))

dataset <- dataset %>%
  mutate(
    year_month = floor_date(timestamp, "month"),
    day = day(timestamp)
  )

dataset %>%
  group_by(year_month) %>%
  filter(day == max(day)) %>%
  ungroup()
#> # A tibble: 3 × 4
#>   timestamp    var year_month   day
#>   <date>     <dbl> <date>     <int>
#> 1 2010-01-23    11 2010-01-01    23
#> 2 2010-02-03     4 2010-02-01     3
#> 3 2011-02-23    11 2011-02-01    23

Created on 2022-05-18 by the reprex package (v2.0.1)reprex 包于 2022-05-18 创建 (v2.0.1)

An option could be the lubridate package, eg一个选项可能是lubridate包,例如

 library(lubridate)
 library(dplyr)
    dataset <- data.frame(
      timestamp = c("2010-01-01", "2010-01-03",
     "2010-01-23", "2010-02-01", "2010-02-03", "2010-02-23"),
      var = c(1, 4, 11, 1, 4, 11)
    )
    
    
    dataset %>%
      mutate(month = timestamp %>% ymd() %>% month()) %>%
      group_by(month) %>%
      slice_tail()

Outcome:结果:

# A tibble: 2 x 3
# Groups:   month [2]
  timestamp    var month
  <chr>      <dbl> <dbl>
1 2010-01-23    11     1
2 2010-02-23    11     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM