合并两个数据集，其中一个是 year_month，另一个是 year_month_week

Question

I practice data merging using R nowadays.我现在使用 R 练习数据合并。 Here are simple two data df1 and df2 .这是简单的两个数据df1和df2 。

df1<-data.frame(id=c(1,1,1,2,2,2,2),
                year_month=c(202205,202206,202207,202204,202205,202206,202207),
                points=c(65,58,47,21,25,27,43))

df2<-data.frame(id=c(1,1,1,2,2,2),
                year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
                temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

For df1 , 202205 in year_month column means May 2022. For df2 , 2022052 in year_month_week column means 2nd week of May, 2022. I want to merge df1 and df2 with respect to year_month_week .对于df1 ， 202205列中的year_month表示 2022 年 5 月。对于df2 ， 2022052列中的year_month_week表示 2022 年 5 月的第 2 周。我想合并df1和df2关于year_month_week 。 So, all the elements of df2 are left, but some values of df2 can be copied.因此， df2的所有元素都被保留了，但可以复制df2的某些值。 For example, 202205 in year_month includes 2022052 and 2022053 .例如， 202205中的year_month包括2022052和2022053 。 There is no column points in df2 . df2中没有列points 。 In this case, 65 is copied.在这种情况下，复制65 。 My expected output looks like this:我预期的 output 看起来像这样：

df<-data.frame(id=c(1,1,1,2,2,2),
               year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
               temperature=c(36.1,36.3,36.6,34.3,34.9,35.3),
               points=c(65,65,58,21,25,25))

Answer 1

Create a temporary year_month column in df2 by taking the first six characters of year_month_week , then do a left join on df1 by year_month and id before removing the temporary column.通过获取year_month_week的前六个字符在df2中创建一个临时year_month列，然后在删除临时列之前按year_month和id在df1上进行左连接。

Using tidyverse, we could do this as follows:使用 tidyverse，我们可以这样做：

library(tidyverse)

df2 %>%
  mutate(year_month = as.numeric(substr(year_month_week, 1, 6))) %>%
  left_join(df1, by = c('year_month', 'id')) %>%
  select(-year_month)
#>   id year_month_week temperature points
#> 1  1         2022052        36.1     65
#> 2  1         2022053        36.3     65
#> 3  1         2022061        36.6     58
#> 4  2         2022043        34.3     21
#> 5  2         2022051        34.9     25
#> 6  2         2022052        35.3     25

Or in base R using merge :或者在基础 R 中使用merge ：

df2$year_month <- substr(df2$year_month_week, 1, 6)
merge(df2, df1, by = c('year_month', 'id'))[-1]
#>   id year_month_week temperature points
#> 1  2         2022043        34.3     21
#> 2  1         2022052        36.1     65
#> 3  1         2022053        36.3     65
#> 4  2         2022051        34.9     25
#> 5  2         2022052        35.3     25
#> 6  1         2022061        36.6     58

合并两个数据集，其中一个是 year_month，另一个是 year_month_week

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-26 14:38:19

合并两个数据集，其中一个是 year_month，另一个是 year_month_week

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-26 14:38:19

解决方案1
1 已采纳 2022-07-26 14:38:19