简体   繁体   English

如何在R中执行左连接?

[英]How to execute a left join in R?

Below is the sample data and one manipulation.下面是示例数据和一种操作。 The first data set is employment specific to an industry.第一个数据集是特定于行业的就业。 The second data set is overall employment and unemployment rate.第二组数据是整体就业和失业率。 I am seeking to do a left join (or at least that's what I think it should be) to achieve the desired result below.我正在寻求进行左连接(或者至少我认为应该是这样)以实现以下所需的结果。 When I do it, I get a one to many issue with the row count growing.当我这样做时,随着行数的增长,我遇到了一对多的问题。 In this example, it goes from 14 to 18. In the larger data set, it goes from 228 to 4348. Primary question is if this can be done with only a properly written join script or is there more to it?在这个例子中,它从 14 到 18。在更大的数据集中,它从 228 到 4348。主要问题是这是否可以只用一个正确编写的连接脚本来完成,或者还有更多吗?

 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
 emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)

 firstset<-data.frame(area1,periodyear,month,emp1)



 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
 rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
 emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)

 secondset<-data.frame(area2,periodyear1,period,rate,emp2)

 secondset <- secondset%>%mutate(month = as.numeric(period))

 secondset <- left_join(firstset,secondset, by=c("month"))

Desired Result (14 rows with below being the first 3)所需结果(14 行,下面是前 3 行)

 area1     periodyear   month     emp1    rate    emp2
000000         2020        1        10      3.0    1001
000000         2020        2        11      3.2    1002
000000         2020        3        12      3.4    1005

We may have to add 'periodyear' as well in the by我们可能还需要在by添加“periodyear”

library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1", 
      "area1" = "area2", "month"))

-output -输出

   area1 periodyear month emp1 period rate emp2
1      0       2020     1   10      1  3.0 1001
2      0       2020     2   11      2  3.2 1002
3      0       2020     3   12      3  3.4 1005
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM