[英]Merge two datasets but one of them is year_month and the other is year_month_week
I practice data merging using R nowadays.我现在使用 R 练习数据合并。 Here are simple two data df1
and df2
.这是简单的两个数据df1
和df2
。
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month=c(202205,202206,202207,202204,202205,202206,202207),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1
, 202205
in year_month
column means May 2022. For df2
, 2022052
in year_month_week
column means 2nd week of May, 2022. I want to merge df1
and df2
with respect to year_month_week
.对于df1
, 202205
列中的year_month
表示 2022 年 5 月。对于df2
, 2022052
列中的year_month_week
表示 2022 年 5 月的第 2 周。我想合并df1
和df2
关于year_month_week
。 So, all the elements of df2
are left, but some values of df2
can be copied.因此, df2
的所有元素都被保留了,但可以复制df2
的某些值。 For example, 202205
in year_month
includes 2022052
and 2022053
.例如, 202205
中的year_month
包括2022052
和2022053
。 There is no column points
in df2
. df2
中没有列points
。 In this case, 65
is copied.在这种情况下,复制65
。 My expected output looks like this:我预期的 output 看起来像这样:
df<-data.frame(id=c(1,1,1,2,2,2),
year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3),
points=c(65,65,58,21,25,25))
Create a temporary year_month
column in df2
by taking the first six characters of year_month_week
, then do a left join on df1
by year_month
and id
before removing the temporary column.通过获取year_month_week
的前六个字符在df2
中创建一个临时year_month
列,然后在删除临时列之前按year_month
和id
在df1
上进行左连接。
Using tidyverse, we could do this as follows:使用 tidyverse,我们可以这样做:
library(tidyverse)
df2 %>%
mutate(year_month = as.numeric(substr(year_month_week, 1, 6))) %>%
left_join(df1, by = c('year_month', 'id')) %>%
select(-year_month)
#> id year_month_week temperature points
#> 1 1 2022052 36.1 65
#> 2 1 2022053 36.3 65
#> 3 1 2022061 36.6 58
#> 4 2 2022043 34.3 21
#> 5 2 2022051 34.9 25
#> 6 2 2022052 35.3 25
Or in base R using merge
:或者在基础 R 中使用merge
:
df2$year_month <- substr(df2$year_month_week, 1, 6)
merge(df2, df1, by = c('year_month', 'id'))[-1]
#> id year_month_week temperature points
#> 1 2 2022043 34.3 21
#> 2 1 2022052 36.1 65
#> 3 1 2022053 36.3 65
#> 4 2 2022051 34.9 25
#> 5 2 2022052 35.3 25
#> 6 1 2022061 36.6 58
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.