[英]How do you convert an integer into a date (format YYYY) in R
I have a big data set, filled with dates that are in integer form, and different types of integer form (YYYYMMDD, YYYYMM and YYYY)我有一个大数据集,充满了 integer 形式的日期,以及不同类型的 integer 形式(YYYYMMDD、YYYYMM 和 YYYY)
I just want to get all of them to YYYY.我只想让所有这些都达到 YYYY。
I first tried splitting up the data frame into three data frames with the different respective date (integer) forms. And then I tried to change the data form from integer to date.我首先尝试将数据框分成三个数据框,分别具有不同的日期(整数)forms。然后我尝试将数据格式从 integer 更改为最新。
Below is me splitting up the data and then all the various things i have tried and subsequently commented out.下面是我拆分数据,然后是我尝试过并随后注释掉的所有各种事情。
Opera_A <- Opera_split$Date_Format_A
Opera_B <- Opera_split$Date_Format_B
Opera_C <- Opera_split$Date_Format_C
lubridate::dmy(Opera_C$Composer_Born)
#as.Date(Opera_C$Composer_Born, "%m/%d/%y")
#Composer_Born_C <- data.frame(Composer_Born)
#Opera_C <- Opera_C %>% mutate(df,dateTime=as.Date(Composer_Born, format = "%Y-%m-%d"))
#Opera_C <- transform(Opera_C,Composer_Born=as.Date(as.character(Integer),"%Y%m%d"))
# Opera_C$Composer_Born <- as.Date(Opera_C$Composer_Born, '%Y-%m-%d')
I keep getting Error in Opera_C$Composer_Born: $ operator is invalid for atomic vectors
--- do i have to turn these vectors into dfs or can i just convert them directly?我不断
Error in Opera_C$Composer_Born: $ operator is invalid for atomic vectors
--- 我必须将这些向量转换为 dfs 还是直接转换它们?
Any help much appreciated --- I'm an R beginner!非常感谢任何帮助---我是 R 初学者!
Thank you谢谢
> library(anytime)
> anydate(c(20220210, 202202, 2022))
[1] "2022-02-10" "2022-02-01" "2022-01-01"
>
However, YYYYMM is not a really date: it is indeterminate as it could be any day of that month, ditto for YYYY.然而,YYYYMM 并不是一个真正的日期:它是不确定的,因为它可能是那个月的任何一天,YYYY 也是如此。 So
anydate
guesses for you here.所以
anydate
在这里为你猜测。
As suggested by @caldwellst in the comments you can just pull the first 4 characters if they're all in the format you indicated.正如@caldwellst 在评论中所建议的那样,如果前 4 个字符都符合您指定的格式,您可以只提取前 4 个字符。 Then you can use
as.Date(format = "%Y)
to turn into actual date format and just fill in the M and D portions. Then use lubridate::year()
to pull the year info.然后你可以使用
as.Date(format = "%Y)
转换成实际的日期格式,只需填写 M 和 D 部分。然后使用lubridate::year()
来提取年份信息。
library(tidyverse)
library(lubridate)
c(17701216, 177012, 1770) %>%
as.character() %>%
str_sub(1, 4) %>%
as.Date(format = "%Y") %>%
year()
#> [1] 1770 1770 1770
Created on 2022-02-10 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-02-10
If you want them to be recognized as dates, you could do this:如果您希望它们被识别为日期,您可以这样做:
date <- as.numeric(19351225)
as.Date(paste(substr(as.character(date), 1, 4),"-01-01",sep=""),format="%Y-%m-%d")
Assuming that you actually just want an integer indicating the years and all your dates are after the year 1000, then this should work:假设您实际上只想要一个 integer 来指示年份并且所有日期都在 1000 年之后,那么这应该可行:
years_from_date_int <- function (i) {
digits <- ceiling(log10(i))
divide_by <- 10 ** (digits - 4)
floor(i / divide_by)
}
print(years_from_date_int(10210104))
# [1] 1021
It also has the advantage that I don't have to import any libraries and it may be potentially much more efficient as it doesn't have to convert the ints to strings like many of the other answers.它还具有我不必导入任何库的优点,并且它可能更有效,因为它不必像许多其他答案那样将整数转换为字符串。
Quick benchmark:快速基准:
set.seed(12345)
samples <- runif(1000000, min=1000, max = 20200101)
start <- Sys.time()
for (s in samples) {
years_from_date_int(s)
}
end <- Sys.time()
print("Direct int manipulation:")
print(end - start)
start <- Sys.time()
for (s in samples) {
years_from_date_int_string_manipulation(s)
}
end <- Sys.time()
print("With string conversion:")
print(end - start)
# [1] "Direct int manipulation:"
# Time difference of 3.928902 secs
# [1] "With string conversion:"
# Time difference of 16.86341 secs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.