简体   繁体   English

不能在两个日期列上使用dplyr连接

[英]Can't use dplyr joins on two date columns

I'm encountering an error where I try to join two data frames using the dplyr join functions by two Date columns. 我遇到一个错误,我尝试使用两个Date列的dplyr连接函数连接两个数据帧。 This is the error I get: 这是我得到的错误:

Error: cannot join on columns 'DateInfo' x 'DateInfo': Can't join on 'DateInfo' x 'DateInfo' because of incompatible types (Date / Date)

The base merge function works fine and I can't seem to find an example of what could be causing this through googling or other stack overflow questions. 基本merge功能工作正常,我似乎无法通过谷歌搜索或其他堆栈溢出问题找到可能导致此问题的示例。

The problem is I can't create a reproducible example and the data I am using I can't share. 问题是我无法创建可重现的示例和我正在使用的数据我无法共享。 For example this works with no problems: 例如,这没有问题:

d1 <- data_frame(Num = 1:5, DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d2 <- data_frame(Name = c("a", "b", "c", "d", "e"), DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d3 <- left_join(d1, d2, by = c("DateInfo" = "DateInfo"))

Has anyone had any experience with not being able to join on two columns that are, as far as the class function is concerned, are the same type but still getting this error? 有没有人有任何经验,无法加入两个列,就class函数而言,是相同的类型,但仍然得到这个错误?

EDIT: Just to get this out of the way I can get around this error by using merge or converting the dates to characters and then joining, so I'm really just interested in why dplyr would tell me I can't merge on two columns with the same type. 编辑:只是为了解决这个问题,我可以通过使用合并或将日期转换为字符然后加入来解决此错误,所以我真的只是感兴趣为什么dplyr会告诉我我不能合并在两列具有相同的类型。

The reason I can't merge is how the two Date objects are stored. 我无法合并的原因是如何存储两个Date对象。 Thanks to this issue I decided to check the structure of how the two objects are stored and sure enough one is stored as an integer and one is stored as a numeric: 由于这个问题,我决定检查两个对象如何存储的结构,并确定一个存储为整数,一个存储为数字:

> dput(df1$DateInfo[1])
structure(16373, class = "Date")
> dput(df2$DateInfo[1])
structure(16372L, class = "Date")

It appears that the data that was pulled form a DB through the dplyr sql functions is stored as a numeric while the data from a csv is stored as an integer. 看起来通过dplyr sql函数从DB中提取的数据存储为数字,而来自csv的数据存储为整数。 I don't know why that won't let dplyr join on them while merge can or why it happens in the first place but I think this specific question is answered. 我不知道为什么不会让dplyr在merge加入它们或为什么它首先发生但我认为这个具体问题得到了回答。

I just had this exact same issue. 我刚才有同样的问题。 Two data frames, each with a POSIXct date_time column and the dplyr join functions (by = "date_time") would not work due to incompatible types. 由于不兼容的类型,两个数据框(每个都带有POSIXct date_time列和dplyr连接函数(by =“date_time”))不起作用。 Thanks to Matt Mills, I used the dput function to investigate the POSIXct columns and found that, even though both were POSIXct, one came out numeric and the other was character. 感谢Matt Mills,我使用dput函数来研究POSIXct列,发现即使两者都是POSIXct,一个是数字,另一个是字符。

I fixed this by going back to where I created my POSIXct object and used this code: 我通过返回到我创建POSIXct对象的位置来修复此问题并使用此代码:

df_temp <- df_temp %>% 
mutate(date_time = as.numeric(date_time)) %>% 
mutate(date_time = as.POSIXct(date_time, tz = tz_in, origin = "1970-01-01 00:00:00"))

Its weird...its like the POSIXct format remembers its original type. 它很奇怪......它像POSIXct格式一样记得它的原始类型。 My added code forced the date_time fields in both variables to be numeric before converting to POSIXct. 我添加的代码强制转换为POSIXct之前两个变量中的date_time字段都是数字。

dplyr::inner_join now works. dplyr :: inner_join现在有效。 Thanks for this thread; 谢谢你这个帖子; saved my bacon. 救了我的培根。 ;) ;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM