[英]Conditional JOIN on two data frames in R
Suppose there are two data frames likes the following (given from this post ): 假设有两个数据帧,如下所示(从本文中得出):
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
df1
# CustomerId Product
# 1 Toaster
# 2 Toaster
# 3 Toaster
# 4 Radio
# 5 Radio
# 6 Radio
df2
# CustomerId State
# 2 Alabama
# 4 Alabama
# 6 Ohio
The question is how can I do the following sql query in R: 问题是如何在R中执行以下sql查询:
SELECT * FROM df1 JOIN df2 on df1.CustomerId <= df2.CustomerId
What I have known is that I can do the inner join using merge(df1, df2, by = "CustomerId")
. 我所知道的是,我可以使用merge(df1, df2, by = "CustomerId")
进行内部merge(df1, df2, by = "CustomerId")
。 But it is not satisfied the condition of the join. 但是不满足加入条件。
This one confusing way to do this. 这是一种令人困惑的方法。 But it works though: 但是它可以工作:
library(tidyverse)
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
map2_df(
df1$CustomerId, df1$Product,
.f = ~ {
temp <- df2 %>% filter(.x <= CustomerId)
tibble(CustomerId.x = .x, Product = .y,
CustomerId.y = temp$CustomerId, State = temp$State)
}
)
As I found in comments by dear Grothendieck, one straightforward solution is using sqldf package and get exactly my result in sql format: 正如我在亲爱的Grothendieck的评论中所发现的那样,一个简单的解决方案是使用sqldf软件包,并以sql格式获取我的结果:
library(sqldf)
sqldf("SELECT * FROM df1 JOIN df2 on df1.CustomerId <= df2.CustomerId")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.