简体   繁体   English

使用 SQL 数据将 r 中的日期“年-月-日”转换为仅“年和月”

[英]Converting a date 'year - month - date' to only 'year and month' in r with SQL data

I'm working on a problem where I need to merge two datasets.我正在解决需要合并两个数据集的问题。 The first dataset is from SQL and imported using the RODBC library, while the second dataset is imported from Excel.第一个数据集来自 SQL 并使用 RODBC 库导入,而第二个数据集从 Excel 导入。 I want to merge the two dataframes by month and year, however in order to do that, I need to convert the first DF's date column into year-month, from year-month-date.我想按月和年合并两个数据框,但是为了做到这一点,我需要将第一个 DF 的日期列从年月日转换为年月。

I have tried to use as.Date(df$postingdate, format = '%Y %M' or strftime(df$postingdate,"%Y %m") as I normally would do, however the first doesn't work and the second changes the column to character. It has been a problem for days, and I have tried a number of things, mainly suggestions from the following link: [https://stackoverflow.com/questions/6242955/converting-year-and-month-yyyy-mm-format-to-a-date][1]我曾尝试as.Date(df$postingdate, format = '%Y %M'使用as.Date(df$postingdate, format = '%Y %M'strftime(df$postingdate,"%Y %m") ,但是第一个不起作用,并且第二个将列更改为字符。这几天一直有问题,我尝试了很多东西,主要来自以下链接的建议:[https://stackoverflow.com/questions/6242955/converting-year-and-月-yyyy-mm-format-to-a-date][1]

In the bottom I have created a df from output I get when using `dput()´ (df2) and I noticed that under posting date, the data is converted to a number, rather than the actual date (“2020-05-28”, “2020-10-09”, "2021-10-19").在底部,我从使用 `dput()´ (df2) 时得到的输出创建了一个 df,我注意到在发布日期下,数据被转换为数字,而不是实际日期(“2020-05-28 ”、“2020-10-09”、“2021-10-19”)。 Therefor I'm also unsure whatever I have problem because I use the wrong functions, or because the data is of a “unknown” data type.因此,我也不确定我有什么问题,因为我使用了错误的函数,或者因为数据是“未知”数据类型。

A sample of the first dataset where I want to transform date into year – month:我想将日期转换为年 - 月的第一个数据集的示例:

df <- data.frame(
  Posting_Date = c("2020-05-28", "2020-10-09", "2021-10-19"), Sales = c(2702.5, 369, 4134),
  Sales_person_code = c(6L, 10L, 10L), EDI = c(1L, 1L, 1L), 
  City = c(141L, 4L, 6L), Kæde = c(12L, 12L, 12L), 
  Vinter = c(0, 0, 0), Forår = c(1, 0, 0), Sommer = c(0, 0, 0), 
  Efterår = c(0, 1, 1), Fredag = c(0, 1, 0), Lørdag = c(0, 0, 0), 
  Mandag = c(0, 0, 0), Onsdag = c(0, 0, 0), Søndag = c(0, 0, 0), 
  Tirsdag = c(0, 0, 1), Torsdag = c(1, 0, 0), 
  year_month = c("2020-05-28", "2020-10-09", "2021-10-19"))

df2 <- data.frame(
  Posting_Date = c(18410, 18544, 18919), Sales = c(2702.5, 369, 4134), 
  Sales_person_code = c(6L, 10L, 10L),EDI = c(1L, 1L, 1L), 
  City = c(141L, 4L, 6L), Kæde = c(12L, 12L, 12L), 
  Vinter = c(0, 0, 0), Forår = c(1, 0, 0), Sommer = c(0, 0, 0), 
  Efterår = c(0, 1, 1), Fredag = c(0, 1, 0), Lørdag = c(0, 0, 0), 
  Mandag = c(0, 0, 0), Onsdag = c(0, 0, 0), Søndag = c(0, 0, 0), 
  Tirsdag = c(0, 0, 1), Torsdag = c(1, 0, 0), 
  year_month = c(18410, 18544, 18919))

Thanks in advance for any help.在此先感谢您的帮助。 Plz let me know if i can do anything to help you guys, helping me请让我知道我是否可以做任何事情来帮助你们,帮助我

Up front, your attempt of as.Date(df$Posting_Date, format="%Y %m") seems backwards: the function as.Date is for converting from a string to a Date -class, and its format= argument is to identify how to find the year/month/day components of the string , not how you want to convert it later.在前面,您对as.Date(df$Posting_Date, format="%Y %m")的尝试似乎是倒退:函数as.Date用于从字符串转换为Date类,其format=参数是确定如何查找string的年/月/日组件,而不是您以后希望如何转换它。 (Note that in R, a Date is shown as YYYY-MM-DD . Always. Telling R you want a date to be just year/month is saying that you want to convert it to a string, no longer date-like or number-like. lubridate and perhaps other packages allow you to have similar-to- Date like objects.) (请注意,在 R 中, Date显示为YYYY-MM-DD 。总是。告诉 R 你想要一个日期只是年/月是说你想把它转换成一个字符串,不再像日期或数字lubridate . lubridate或许其他包允许你有类似到Date对象。)

For df , one can just subset the strings without parsing to Date -class:对于df可以只对字符串进行子集化,而无需解析为Date类:

substring(df$Posting_Date, 1, 7)
# [1] "2020-05" "2020-10" "2021-10"

If you want to do anything number-like to them, you can convert to Date -class first, and then use format(.) to convert to a string with a specific format.如果你想对它们做任何类似数字的事情,你可以先转换为Date类,然后使用format(.)转换为具有特定格式的字符串。

as.Date(df$Posting_Date)
# [1] "2020-05-28" "2020-10-09" "2021-10-19"
format(as.Date(df$Posting_Date), format = "%Y-%m")
# [1] "2020-05" "2020-10" "2021-10"

For df2 , though, since it is numeric you need to specify an origin= instead of a format= .但是,对于df2 ,由于它是数字,因此您需要指定origin=而不是format= I'm inferring that these are based off of epoch, so我推断这些是基于纪元的,所以

as.Date(df2$Posting_Date, origin = "1970-01-01")
# [1] "2020-05-28" "2020-10-09" "2021-10-19"
format(as.Date(df2$Posting_Date, origin = "1970-01-01"), format = "%Y-%m")
# [1] "2020-05" "2020-10" "2021-10"

Note that R stores Date (and POSIXct , incidentally) as numbers internally:请注意,R 在内部将Date (和POSIXct ,顺便说一下)存储为数字:

dput(as.Date(df2$Posting_Date, origin = "1970-01-01"))
# structure(c(18410, 18544, 18919), class = "Date")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM