[英]How to select the last day of the month in R
如何选取对应月份最后一天的数据? 例如,我有包含V1
从 2000 年到 2016 年的每日数据的数据集。我只需要选择每个月的最后一天,即31/01/2001
、 28/02/2001
等所有年份。 日期格式为DD/MM/YYYY
。
V1 V2
4.59 29/12/2000
4.59 01/01/2001
4.58 02/01/2001
4.52 03/01/2001
4.54 04/01/2001
4.58 05/01/2001
......
4.09 26/01/2001
4.50 27/01/2001
4.18 28/01/2001
4.11 29/01/2001
3.54 30/01/2001
4.98 31/01/2001 <- Select this row!
library(data.table)
library(lubridate)
# for each unique combo of year-mon get the last entry
setDT(df)[order(V2), .(V1[which.max(V2)], V2[which.max(V2)]), by = .(year(V2), month(V2))]
# year month V1 V2
#1: 2000 12 4.59 2000-12-29
#2: 2001 1 4.98 2001-01-31
这也可以用基础 R 来完成。
df[df$V2 %in% unique(as.Date(format(df$V2 + 28, "%Y-%m-01")) - 1),]
V1 V2
12 4.98 2001-01-31
这使用了我从 Dirk Dirk Eddelbuettel 的 SO 答案之一中学到的技巧。 这个想法是将日期设置为下个月的第一天,然后从中减去 1。
数据
df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09,
4.5, 4.18, 4.11, 3.54, 4.98), V2 = structure(c(11320, 11323,
11324, 11325, 11326, 11327, 11348, 11349, 11350, 11351, 11352,
11353), class = "Date")), .Names = c("V1", "V2"), row.names = c(NA,
-12L), class = "data.frame")
概念证明
# construct a vector of dates for 10 years, 2001 through 2010
myDates <- seq(as.Date("2001-01-01"), as.Date("2010-12-31"), by="day")
# pull off the final days of the month
finalDays <-
myDates[myDates %in% unique(as.Date(format(myDates + 28, "%Y-%m-01")) - 1)]
# Take a look at first 5 and last 5
c(head(finalDays, 5), tail(finalDays, 5))
[1] "2001-01-31" "2001-02-28" "2001-03-31" "2001-04-30" "2001-05-31"
[6] "2010-08-31" "2010-09-30" "2010-10-31" "2010-11-30" "2010-12-31"
# get length, 12 * 10 = 120
length(finalDays)
[1] 120
# make sure there are no repeated values
length(unique(finalDays))
[1] 120
我们可以使用dplyr
library(dplyr)
library(lubridate)
library(zoo)
如果我们只需要一个月的最后一天而不是在数据集中找到的最后一天
df %>%
filter(dmy(V2) == as.Date(as.yearmon(dmy(V2)), frac=1))
# V1 V2
#1 4.98 31/01/2001
但是,如果我们需要过滤每个月在数据集中找到的最后一天
df %>%
mutate(V3 = dmy(V2))%>%
group_by(month = month(V3), year = year(V3)) %>%
slice(which.max(day(V3))) %>%
ungroup() %>%
select(-month, -year, -V3)
# V1 V2
# <dbl> <chr>
#1 4.98 31/01/2001
#2 4.59 29/12/2000
如果只按“月”分组,只需删除group_by
中的year = year(V3))
,我们将得到
df %>%
mutate(V3 = dmy(V2))%>%
group_by(month = month(V3)) %>%
slice(which.max(day(V3))) %>%
ungroup() %>%
select(-month, -V3)
df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09,
4.5, 4.18, 4.11, 3.54, 4.98), V2 = c("29/12/2000", "01/01/2001",
"02/01/2001", "03/01/2001", "04/01/2001", "05/01/2001", "26/01/2001",
"27/01/2001", "28/01/2001", "29/01/2001", "30/01/2001", "31/01/2001"
)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-12L))
subset(df, as.POSIXlt(V2 + 1)$mday == 1)
## you don't have 31-Dec in your data
# V1 V2
# 1 4.98 31/01/2001
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.