简体   繁体   中英

R: assign seasons to dates by day and month

I have a dataframe, which contains dates of many years. It looks like this (instead of three years, I have 40 years):

DATES<-c(seq(as.Date('2017-01-01'), as.Date('2019-12-31'), by = 'days'))
df<-data.frame(DATES)

for each day I want to add the season. Hereby, Spring should start at the 20th of March, Summer at the 21st of June, Autumn at the 23rd of September, Winter at the 21st of December. These dates stay unchanged over the years.

I came up with the following code, which works (at least, I think so). However, I was wondering, if there isn't are more elegant way, to get what I want.

df$MONTH<-month(df$DATES)
df$DAY<-mday(df$DATES)
df$DAY_PLUS_MONTH<-df$DAY+df$MONTH*100

df <- df %>%
  mutate(SEASON = case_when(
    DAY_PLUS_MONTH %in% 320:620 ~ 'SPRING',
    DAY_PLUS_MONTH %in% 621:922 ~ 'SUMMER',
    DAY_PLUS_MONTH %in% 923:1221 ~ 'AUTUMN',
    TRUE ~ 'WINTER'))

I think this should work for you:

cut(lubridate::yday(df$DATES - lubridate::days(79)), 
    breaks = c(0, 93, 187, 276, Inf), 
    labels = c("Spring", "Summer", "Autumn", "Winter"))

Using $yday (whether from lubridate or as.POSIXlt ) may give erroneous results for leap-years. I think a safer method is to create a vector of each of those dates across the years, adding one year in each direction (before/after).

I'm using findInterval , but it's close-enough to cut that you could use that method using the same variables here.

season_dates <- as.Date(sort(c(outer(
  do.call(seq.int, as.list(1900 + as.POSIXlt(range(df$DATES) + c(-365, 365))$year)), 
  c("-03-20", "-06-21", "-09-23", "-12-21"),
  paste0))))
season_dates
#  [1] "2016-03-20" "2016-06-21" "2016-09-23" "2016-12-21" "2017-03-20" "2017-06-21" "2017-09-23" "2017-12-21" "2018-03-20"
# [10] "2018-06-21" "2018-09-23" "2018-12-21" "2019-03-20" "2019-06-21" "2019-09-23" "2019-12-21" "2020-03-20" "2020-06-21"
# [19] "2020-09-23" "2020-12-21"
season_names <- rep(c("Spring", "Summer", "Autumn", "Winter"), length.out = length(season_dates))
season_names
#  [1] "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring"
# [14] "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter"

set.seed(42)
as_tibble(df) %>%
  mutate(SEASON = season_names[ findInterval(DATES, season_dates) ]) %>%
  sample_n(10) %>%
  arrange(DATES)
# # A tibble: 10 x 2
#    DATES      SEASON
#    <date>     <chr> 
#  1 2017-01-24 Winter
#  2 2017-02-18 Winter
#  3 2017-06-14 Spring
#  4 2017-11-17 Autumn
#  5 2017-12-22 Winter
#  6 2018-02-14 Winter
#  7 2018-07-15 Summer
#  8 2018-09-14 Summer
#  9 2018-09-26 Autumn
# 10 2019-06-18 Spring

I sampled the output just to show some variance, as otherwise the first 10 results were all winter. Also, I used as.POSIXlt(.)$year and then had to adjust it since it is 1900-based. lubridate::year would work here, too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM