繁体   English   中英

如何在R中选择一个月的最后一天

[英]How to select the last day of the month in R

如何选取对应月份最后一天的数据? 例如,我有包含V1从 2000 年到 2016 年的每日数据的数据集。我只需要选择每个月的最后一天,即31/01/200128/02/2001等所有年份。 日期格式为DD/MM/YYYY

 V1         V2
4.59 29/12/2000
4.59 01/01/2001
4.58 02/01/2001
4.52 03/01/2001
4.54 04/01/2001
4.58 05/01/2001
......
4.09 26/01/2001
4.50 27/01/2001
4.18 28/01/2001
4.11 29/01/2001
3.54 30/01/2001
4.98 31/01/2001  <- Select this row!
library(data.table)
library(lubridate)

# for each unique combo of year-mon get the last entry
setDT(df)[order(V2), .(V1[which.max(V2)], V2[which.max(V2)]), by = .(year(V2), month(V2))] 
#   year month   V1         V2
#1: 2000    12 4.59 2000-12-29
#2: 2001     1 4.98 2001-01-31

这也可以用基础 R 来完成。

df[df$V2 %in% unique(as.Date(format(df$V2 + 28, "%Y-%m-01")) - 1),]
    V1         V2
12 4.98 2001-01-31

这使用了我从 Dirk Dirk Eddelbuettel 的 SO 答案之一中学到的技巧。 这个想法是将日期设置为下个月的第一天,然后从中减去 1。

数据

df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09, 
4.5, 4.18, 4.11, 3.54, 4.98), V2 = structure(c(11320, 11323, 
11324, 11325, 11326, 11327, 11348, 11349, 11350, 11351, 11352, 
11353), class = "Date")), .Names = c("V1", "V2"), row.names = c(NA, 
-12L), class = "data.frame")

概念证明

# construct a vector of dates for 10 years, 2001 through 2010
myDates <- seq(as.Date("2001-01-01"), as.Date("2010-12-31"), by="day")

# pull off the final days of the month
finalDays <-
       myDates[myDates %in% unique(as.Date(format(myDates + 28, "%Y-%m-01")) - 1)]

# Take a look at first 5 and last 5
c(head(finalDays, 5), tail(finalDays, 5))
 [1] "2001-01-31" "2001-02-28" "2001-03-31" "2001-04-30" "2001-05-31"
 [6] "2010-08-31" "2010-09-30" "2010-10-31" "2010-11-30" "2010-12-31"

# get length, 12 * 10 = 120
length(finalDays)
[1] 120

# make sure there are no repeated values
length(unique(finalDays))
[1] 120

我们可以使用dplyr

library(dplyr)
library(lubridate)
library(zoo)

如果我们只需要一个月的最后一天而不是在数据集中找到的最后一天

 df %>% 
      filter(dmy(V2) == as.Date(as.yearmon(dmy(V2)), frac=1))
 #    V1         V2
 #1 4.98 31/01/2001

但是,如果我们需要过滤每个月在数据集中找到的最后一天

df %>%
    mutate(V3 = dmy(V2))%>%
    group_by(month = month(V3), year = year(V3)) %>%
    slice(which.max(day(V3))) %>%
    ungroup() %>%
    select(-month, -year, -V3)
#     V1         V2
#   <dbl>      <chr>
#1  4.98 31/01/2001
#2  4.59 29/12/2000

如果只按“月”分组,只需删除group_by中的year = year(V3)) ,我们将得到

df %>%
    mutate(V3 = dmy(V2))%>%
    group_by(month = month(V3)) %>%
    slice(which.max(day(V3))) %>%
    ungroup() %>%
    select(-month,  -V3)

数据

df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09, 
4.5, 4.18, 4.11, 3.54, 4.98), V2 = c("29/12/2000", "01/01/2001", 
"02/01/2001", "03/01/2001", "04/01/2001", "05/01/2001", "26/01/2001", 
"27/01/2001", "28/01/2001", "29/01/2001", "30/01/2001", "31/01/2001"
)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-12L))
subset(df, as.POSIXlt(V2 + 1)$mday == 1)

## you don't have 31-Dec in your data
#     V1         V2
# 1 4.98 31/01/2001

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM