简体   繁体   中英

Summarize Dates into Varying Groups

I have a variable that provides miscellaneous dates. I want to summarize these so they can be factored before being used in a predictive model.

I would like to do group the dates by the following:

  • This Year (this calendar year)
  • Last Year
  • Over 3 Years Ago

I'm pretty new to R so any help on this would be much appreciated. Thank you

As other commenters have noted, you haven't supplied any data or a reproducible example, but let's give this a go anyway.

I'll be using two tidyverse packages, dplyr and lubridate , to help us out.

For present purposes, let's start by generating some random dates and put these into a dataframe/tibble. I'm assuming your dates are already within a dataframe in the right class, as Gregor pointed out above.

data <- tibble(date = sample(seq(as.Date('2015-01-01'), as.Date('2020-12-31'), by="day"), 50))

Let's now use dplyr and lubridate to recode the dates into a new variable, date_group :

data %>%
  mutate(date_group = factor(
    case_when(
      year(date) == year(today()) ~ "This Year",
      year(date) == year(today()) - 1 ~ "Last Year",
      year(date) < today() - years(3) ~ "Over 3 Years Ago",
      TRUE ~ "Other"
    )
  ))

For the first two groups, we apply use the lubridate function year() (which extracts the year from a date) to the date column in data , and compare this against the year extracted from today's date (using today() ).

For dates over 3 years ago, we subtract 3 years from today's date (noting that this is different from the calendar-year based calculations for this year and last year) using years() .

Of course, this leaves a gap for dates less than 3 years ago but more than 1 calendar year ago. We have a default option in the case_when function to specify this as "Other".

We wrap the result of the case_when function in factor() so that the resulting groups are treated as a factor rather than a string ready for subsequent modelling.

The case_when function is useful (and easy to read) if you have just a few categories. Too many and it gets too messy and you should think about another way to restructure your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM