简体   繁体   English

在 R 中查找我的数据集中的最小和最大日期

[英]Finding the smallest and largest date in my dataset in R

I have a dataset with Character and Date variables.我有一个包含字符和日期变量的数据集。 I would like to find the smallest and largest date in my dataset.我想在我的数据集中找到最小和最大的日期。

I am trying to use the pmin function but this does not seem to be working.我正在尝试使用 pmin 功能,但这似乎不起作用。 Once the max and min date have been extracted, I want to create a dataset with a sequence of dates between them.一旦提取了最大和最小日期,我想创建一个数据集,它们之间有一系列日期。 For example, if the oldest date is 2021-02-01 (from new column) and most recent is 2022-06-20 (from old column) I want to create a list of dates between the two.例如,如果最早的日期是 2021-02-01(来自新列),最近的日期是 2022-06-20(来自旧列),我想创建两者之间的日期列表。

Table:桌子:

ID ID Old老的 New新的 Tier等级
001 001 NA不适用 2021-02-01 2021-02-01 A一个
002 002 NA不适用 2021-02-01 2021-02-01 A一个
003 003 NA不适用 2021-02-21 2021-02-21 A一个
004 004 NA不适用 2021-04-21 2021-04-21 A一个
005 005 NA不适用 2021-04-21 2021-04-21 A一个
006 006 NA不适用 2021-04-21 2021-04-21 A一个
006 006 2022-06-20 2022-06-20 2021-04-21 2021-04-21 B
002 002 2021-08-10 2021-08-10 2021-04-21 2021-04-21 B
003 003 2022-06-20 2022-06-20 2021-05-01 2021-05-01 B
003 003 2022-06-20 2022-06-20 2021-05-01 2021-05-01 B
003 003 2021-08-10 2021-08-10 2021-05-21 2021-05-21 B
003 003 2021-08-10 2021-08-10 2021-07-21 2021-07-21 B

Format variables in extended data: using str()格式化扩展数据中的变量:使用 str()

$ Old : Date, format: "2021-04-30" $ Id : chr $ New : Date, format: "2021-02-03" "2021-02-03" $ New1 : Date, format: NA NA NA NA ... $ New2 : Date, format: "2021-01-10" "2021-01-10" $ New3 : Date, format: NA NA "2021-06-10" NA ... $ New4 : Date, format: NA NA NA NA ... $ New5 : Date, format: NA NA "2022-07-10" NA ... $ 旧:日期,格式:“2021-04-30” $ Id:字符 $ 新:日期,格式:“2021-02-03”“2021-02-03” $ New1:日期,格式:NA NA NA NA ... $ New2 :日期,格式:“2021-01-10” “2021-01-10” $ New3 :日期,格式:NA NA “2021-06-10” NA ... $ New4 :日期,格式: NA NA NA NA ... $ New5 : 日期, 格式: NA NA "2022-07-10" NA ...

In base R you can get the date range like this:在 base R 中,您可以获得这样的日期范围:

range(unlist(df[sapply(df, class) == "Date"]), na.rm = TRUE) |>
  as.Date(origin = "1970-01-01")
#> [1] "2021-02-01" "2022-06-20"

Explanation解释

To work with just columns of class "Date" in your data frame, you can do df[sapply(df, class) == "Date"] .要在数据框中仅使用“日期”类的列,您可以执行df[sapply(df, class) == "Date"] If you unlist these columns, they form a single vector from which you can get the range (ie min / max), being sure you exclude NA values.如果您unlist这些列,它们会形成一个向量,您可以从中获取range (即最小值/最大值),并确保排除NA值。

Unfortunately, these steps remove the class attribute from the vector, so you need to convert it back to a date.不幸的是,这些步骤从向量中删除了类属性,因此您需要将其转换回日期。

Same basic idea as Allan but an approach that preserves the class:与艾伦相同的基本思想,但保留类的方法:

do.call(range, Filter(\(x) inherits(x, "Date"), dat))

[1] "2022-06-22" "2022-08-07"

Data:数据:

dat <- data.frame(a = Sys.Date() + sample(50, 5),
                  b = letters[1:5],
                  c = Sys.Date() + sample(50, 5),
                  d = runif(5))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM