[英]How can I convert this date-format in a format accepted by lubridate?
I have imported data in R from an Excel sheet with package readxl.我已经从带有 package readxl 的 Excel 表中导入了 R 中的数据。
The sheet contains a column with dates.该工作表包含一个带有日期的列。 These dates behave like dates in Excel (I can change the date formatting in Excel).
这些日期的行为类似于 Excel 中的日期(我可以在 Excel 中更改日期格式)。
Directly after importing in R with readxl the format is this:使用 readxl 直接导入 R 后,格式如下:
# A tibble: 1 x 1
`datum`
<dttm>
1 2010-01-20 21:00:00
My goal is to use the lubridate function days_in_month on the imported dates.我的目标是在导入的日期上使用 lubridate function days_in_month。
lubridate::days_in_month(df[2,1])
Although using this function gives this error:虽然使用这个 function 给出了这个错误:
Error in as.POSIXlt.default(x, tz = tz(x)) :
do not know how to convert 'x' to class “POSIXlt”
I did serveral test to identify the format:我做了几次测试来识别格式:
is.Date(df[2,1])
is.POSIXt(df[2,1])
is.instant(df[2,1])
All give result FALSE.所有结果都为 FALSE。
If I print one date I receive this result:如果我打印一个日期,我会收到以下结果:
# A tibble: 1 x 1
`datum`
<dttm>
1 2010-01-20 21:00:00
I have tried several conversions:我尝试了几种转换:
df$datum <- as.Date(df$datum, origin = "1899-12-30")
df$datum <- as.Date(as.POSIXct(df$datum, 'GMT'))
df$datum <- as.Date(df$datum, format='%Y-%m-%d')
Although the results of the tests above after conversion are all FALSE.尽管转换后的上述测试结果都是 FALSE。
If I do the first conversion as.Date(df$datum, origin = "1899-12-30").如果我将第一次转换为.Date(df$datum, origin = "1899-12-30")。 After this the outcome of print is:
在此之后打印的结果是:
# A tibble: 1 x 1
`datum`
<date>
1 2010-01-20
df$datum + 60 gives:
1 2010-03-21
So it seems it is behaving as a date since I can add 60.所以看起来它就像一个日期,因为我可以添加 60。
Although all the test give FALSE and days_in_month from lubridate still gives the error above.尽管所有测试都给出了 FALSE 和 lubridate 的 days_in_month 仍然给出了上述错误。
How can I convert the date into a correct format which lubridate can process?如何将日期转换为 lubridate 可以处理的正确格式?
Thanks a lot!非常感谢!
You are being bitten by the differences in [
between data.frame
and tbl_df
.您被
data.frame
和tbl_df
之间[
之间的差异所困扰。 Reading your file (present in the comments), I ultimately see:阅读您的文件(出现在评论中),我最终看到:
df <- readxl::read_excel("example dates.xlsx")
df
# # A tibble: 3 x 2
# datum datum2
# <dttm> <dttm>
# 1 2010-01-01 13:25:00 2010-12-22 23:53:40
# 2 2010-01-23 13:30:00 2011-01-07 23:09:10
# 3 2010-02-16 21:45:00 2011-03-19 01:00:52
# for everybody else
df <- structure(list(datum = structure(c(1262352300, 1264253400, 1266356700), class = c("POSIXct", "POSIXt"), tzone = "UTC"), datum2 = structure(c(1293062020.704, 1294441750.08, 1300496452.128), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
Can we agree that it does not make sense to try to convert a whole frame at once?我们是否同意尝试一次转换整个帧没有意义?
as.Date(df)
# Error in as.Date.default(df) :
# do not know how to convert 'df' to class "Date"
Dumb question.愚蠢的问题。 Well, let's see what happens with other variants.
好吧,让我们看看其他变体会发生什么。
df$datum
# [1] "2010-01-01 13:25:00 UTC" "2010-01-23 13:30:00 UTC" "2010-02-16 21:45:00 UTC"
as.Date(df$datum)
# [1] "2010-01-01" "2010-01-23" "2010-02-16"
df[2,1]
# # A tibble: 1 x 1
# datum
# <dttm>
# 1 2010-01-23 13:30:00
as.Date(df[2,1])
# Error in as.Date.default(df[2, 1]) :
# do not know how to convert 'df[2, 1]' to class "Date"
With a simple data.frame
, [2,1]
will return a scalar, not a frame, so that makes sense in base R:使用简单的
data.frame
, [2,1]
将返回一个标量,而不是一个帧,因此这在基础 R 中是有意义的:
as.data.frame(df)[2,1]
# [1] "2010-01-23 13:30:00 UTC"
as.Date(as.data.frame(df)[2,1])
# [1] "2010-01-23"
So the problem is that tibble
is forcing you to be explicit in that you want to drop from a frame to a scalar/vector.所以问题是
tibble
迫使你明确表示你想从一个帧下降到一个标量/向量。
This is normally a good thing, frankly.坦率地说,这通常是一件好事。 When dealing with a "normal" (non-
tibble
) frame and you want to look at a group of columns, as.data.frame(df[,1:2])
, R always returns a data.frame
.当处理“正常”(非
tibble
)帧并且您想查看一组列时, as.data.frame(df[,1:2])
, R 总是返回一个data.frame
。 Unfortunately, if you define the columns programmatically and it returns a single column, then [
by default reduces it from a frame to a vector: as.data.frame(df)[,1]
.不幸的是,如果您以编程方式定义列并且它返回单个列,则
[
默认情况下将其从框架减少为向量: as.data.frame(df)[,1]
。 You can prevent this auto-coercion with drop=
, ala as.data.frame(df[,1,drop=FALSE])
.您可以使用
drop=
,ala as.data.frame(df[,1,drop=FALSE])
来防止这种自动强制。 Many (including myself) consider this to be a mistake: df[,cols]
should be relied on to always return the same type of object, regardless if it is 20 columns or just 1 column.许多人(包括我自己)认为这是一个错误:应依赖
df[,cols]
始终返回相同类型的 object,无论它是 20 列还是仅 1 列。 (I recognize that there are reasons why it does this, and I'm not berating the original R developers.) (我知道这样做是有原因的,我并不是在谴责最初的 R 开发人员。)
So the problem causing your error is that tibble
is requiring you to be explicit when subsetting your tbl_df
into a single cell.因此,导致您的错误的问题是
tibble
要求您在将tbl_df
子集到单个单元格时要明确。 If you want to work on a single cell, use df$datum[2]
or df[2,1][[1]]
to force it.如果要处理单个单元格,请使用
df$datum[2]
或df[2,1][[1]]
强制执行。 If you want to work on a whole column, then df$datum
.如果您想处理一整列,那么
df$datum
。 And all of those work directly with as.Date
, since it knows how to deal with vectors of POSIXt
(natively) and numeric
/ integer
(along with origin=
).所有这些都直接与
as.Date
一起工作,因为它知道如何处理POSIXt
(本机)和numeric
/ integer
(以及origin=
)的向量。 Unfortunately, df[,1]
of a tibble will not return a vector, so as.Date
does not know what to do with it.不幸的是,tibble 的
df[,1]
不会返回向量,因此as.Date
不知道如何处理它。
Bottom line:底线:
as.Date(df$datum[2])
# [1] "2010-01-23"
as.Date(df[2,1][[1]])
# [1] "2010-01-23"
as.Date(df$datum)
# [1] "2010-01-01" "2010-01-23" "2010-02-16"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.