将日期汇总到不同的组中

Question

I have a variable that provides miscellaneous dates.我有一个提供杂项日期的变量。 I want to summarize these so they can be factored before being used in a predictive model.我想总结一下这些，以便在将它们用于预测模型之前将它们分解。

I would like to do group the dates by the following:我想按以下方式对日期进行分组：

This Year (this calendar year)今年（本日历年）
Last Year去年
Over 3 Years Ago 3 年多以前

I'm pretty new to R so any help on this would be much appreciated.我对 R 很陌生，因此非常感谢您对此的任何帮助。 Thank you谢谢

Answer 1

As other commenters have noted, you haven't supplied any data or a reproducible example, but let's give this a go anyway.正如其他评论者所指出的，您没有提供任何数据或可重复的示例，但无论如何让我们试一试。

I'll be using two tidyverse packages, dplyr and lubridate , to help us out.我将使用两个 tidyverse 包dplyr和lubridate来帮助我们。

For present purposes, let's start by generating some random dates and put these into a dataframe/tibble.出于目前的目的，让我们首先生成一些随机日期并将它们放入数据框/tibble 中。 I'm assuming your dates are already within a dataframe in the right class, as Gregor pointed out above.正如 Gregor 上面指出的那样，我假设您的日期已经在正确类的数据框中。

data <- tibble(date = sample(seq(as.Date('2015-01-01'), as.Date('2020-12-31'), by="day"), 50))

Let's now use dplyr and lubridate to recode the dates into a new variable, date_group :现在让我们使用dplyr和lubridate将日期重新编码为一个新变量date_group ：

data %>%
  mutate(date_group = factor(
    case_when(
      year(date) == year(today()) ~ "This Year",
      year(date) == year(today()) - 1 ~ "Last Year",
      year(date) < today() - years(3) ~ "Over 3 Years Ago",
      TRUE ~ "Other"
    )
  ))

For the first two groups, we apply use the lubridate function year() (which extracts the year from a date) to the date column in data , and compare this against the year extracted from today's date (using today() ).对于前两组，我们将使用lubridate函数year() （从日期中提取年份）应用于data的date列，并将其与从今天日期提取的年份（使用today() ）进行比较。

For dates over 3 years ago, we subtract 3 years from today's date (noting that this is different from the calendar-year based calculations for this year and last year) using years() .对于 3 年前的日期，我们使用years()从今天的日期减去 3 年（注意这与今年和去年基于日历年的计算不同years() 。

Of course, this leaves a gap for dates less than 3 years ago but more than 1 calendar year ago.当然，这会为少于 3 年前但多于 1 个日历年前的日期留下空白。 We have a default option in the case_when function to specify this as "Other".我们在case_when函数中有一个默认选项，将其指定为“其他”。

We wrap the result of the case_when function in factor() so that the resulting groups are treated as a factor rather than a string ready for subsequent modelling.我们将case_when函数的结果包装在factor()以便将结果组视为一个因子，而不是准备好进行后续建模的字符串。

The case_when function is useful (and easy to read) if you have just a few categories.如果您只有几个类别，则case_when函数很有用（并且易于阅读）。 Too many and it gets too messy and you should think about another way to restructure your data.太多了，它变得太乱了，你应该考虑另一种方法来重组你的数据。

将日期汇总到不同的组中

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-27 04:16:02

将日期汇总到不同的组中

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-27 04:16:02

解决方案1
1 已采纳 2020-10-27 04:16:02