[英]Finding the smallest and largest date in my dataset in R
I have a dataset with Character and Date variables.我有一个包含字符和日期变量的数据集。 I would like to find the smallest and largest date in my dataset.我想在我的数据集中找到最小和最大的日期。
I am trying to use the pmin function but this does not seem to be working.我正在尝试使用 pmin 功能,但这似乎不起作用。 Once the max and min date have been extracted, I want to create a dataset with a sequence of dates between them.一旦提取了最大和最小日期,我想创建一个数据集,它们之间有一系列日期。 For example, if the oldest date is 2021-02-01 (from new column) and most recent is 2022-06-20 (from old column) I want to create a list of dates between the two.例如,如果最早的日期是 2021-02-01(来自新列),最近的日期是 2022-06-20(来自旧列),我想创建两者之间的日期列表。
Table:桌子:
ID ID | Old老的 | New新的 | Tier等级 |
---|---|---|---|
001 001 | NA不适用 | 2021-02-01 2021-02-01 | A一个 |
002 002 | NA不适用 | 2021-02-01 2021-02-01 | A一个 |
003 003 | NA不适用 | 2021-02-21 2021-02-21 | A一个 |
004 004 | NA不适用 | 2021-04-21 2021-04-21 | A一个 |
005 005 | NA不适用 | 2021-04-21 2021-04-21 | A一个 |
006 006 | NA不适用 | 2021-04-21 2021-04-21 | A一个 |
006 006 | 2022-06-20 2022-06-20 | 2021-04-21 2021-04-21 | B乙 |
002 002 | 2021-08-10 2021-08-10 | 2021-04-21 2021-04-21 | B乙 |
003 003 | 2022-06-20 2022-06-20 | 2021-05-01 2021-05-01 | B乙 |
003 003 | 2022-06-20 2022-06-20 | 2021-05-01 2021-05-01 | B乙 |
003 003 | 2021-08-10 2021-08-10 | 2021-05-21 2021-05-21 | B乙 |
003 003 | 2021-08-10 2021-08-10 | 2021-07-21 2021-07-21 | B乙 |
Format variables in extended data: using str()格式化扩展数据中的变量:使用 str()
$ Old : Date, format: "2021-04-30" $ Id : chr $ New : Date, format: "2021-02-03" "2021-02-03" $ New1 : Date, format: NA NA NA NA ... $ New2 : Date, format: "2021-01-10" "2021-01-10" $ New3 : Date, format: NA NA "2021-06-10" NA ... $ New4 : Date, format: NA NA NA NA ... $ New5 : Date, format: NA NA "2022-07-10" NA ... $ 旧:日期,格式:“2021-04-30” $ Id:字符 $ 新:日期,格式:“2021-02-03”“2021-02-03” $ New1:日期,格式:NA NA NA NA ... $ New2 :日期,格式:“2021-01-10” “2021-01-10” $ New3 :日期,格式:NA NA “2021-06-10” NA ... $ New4 :日期,格式: NA NA NA NA ... $ New5 : 日期, 格式: NA NA "2022-07-10" NA ...
In base R you can get the date range like this:在 base R 中,您可以获得这样的日期范围:
range(unlist(df[sapply(df, class) == "Date"]), na.rm = TRUE) |>
as.Date(origin = "1970-01-01")
#> [1] "2021-02-01" "2022-06-20"
Explanation解释
To work with just columns of class "Date" in your data frame, you can do df[sapply(df, class) == "Date"]
.要在数据框中仅使用“日期”类的列,您可以执行df[sapply(df, class) == "Date"]
。 If you unlist
these columns, they form a single vector from which you can get the range
(ie min / max), being sure you exclude NA
values.如果您unlist
这些列,它们会形成一个向量,您可以从中获取range
(即最小值/最大值),并确保排除NA
值。
Unfortunately, these steps remove the class attribute from the vector, so you need to convert it back to a date.不幸的是,这些步骤从向量中删除了类属性,因此您需要将其转换回日期。
Same basic idea as Allan but an approach that preserves the class:与艾伦相同的基本思想,但保留类的方法:
do.call(range, Filter(\(x) inherits(x, "Date"), dat))
[1] "2022-06-22" "2022-08-07"
Data:数据:
dat <- data.frame(a = Sys.Date() + sample(50, 5),
b = letters[1:5],
c = Sys.Date() + sample(50, 5),
d = runif(5))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.