[英]How to left_join in R and repeat joining value to multiple variables?
I'm confusing something here, either this isn't the right approach or I'm missing a part of the left_join:我在这里混淆了一些东西,要么这不是正确的方法,要么我错过了 left_join 的一部分:
I'm looking to join the "gdp" column, by country and by year, and repeat the value across all three "gender" categories, in a way that all three genders for the same year will have the same associated gdp.我希望按国家和年份加入“gdp”列,并在所有三个“性别”类别中重复该值,以使同一年的所有三个性别都具有相同的相关 gdp。
Here's what I have now:这是我现在拥有的:
library(tidyverse)
table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
"Central and Southern Asia", "Afghanistan", 2011, "female", 0.186,
"Central and Southern Asia","Afghanistan", 2011, "male", 0.454,
"Central and Southern Asia", "Afghanistan", 2011, "total", 0.274,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367)
table_2 <- tribble(~"Country", ~"gdp", ~"Year",
"Afghanistan", 551., 2010,
"Afghanistan", 599.,2011,
"Afghanistan", 649., 2012,
"Afghanistan", 648., 2013,
"Afghanistan", 625., 2014,
"Afghanistan", 590., 2015,
"Afghanistan", 550., 2016,
"Afghanistan", 550., 2017)
table_1 %>% left_join(table_2, by = "Country")
# A tibble: 48 x 7
Region Country Year.x Gender median_rate gdp Year.y
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 Central and Southern Asia Afghanistan 2011 female 0.186 551 2010
2 Central and Southern Asia Afghanistan 2011 female 0.186 599 2011
3 Central and Southern Asia Afghanistan 2011 female 0.186 649 2012
4 Central and Southern Asia Afghanistan 2011 female 0.186 648 2013
5 Central and Southern Asia Afghanistan 2011 female 0.186 625 2014
6 Central and Southern Asia Afghanistan 2011 female 0.186 590 2015
7 Central and Southern Asia Afghanistan 2011 female 0.186 550 2016
8 Central and Southern Asia Afghanistan 2011 female 0.186 550 2017
9 Central and Southern Asia Afghanistan 2011 male 0.454 551 2010
10 Central and Southern Asia Afghanistan 2011 male 0.454 599 2011
# ... with 38 more rows
Expected output would be something like this, with the gdp column from table 2 joined, but only for each matching year, (eg in table 1 there is only data from 2011 and 2018, so it should only match these years)预期的 output 会是这样的,加入表 2 中的 gdp 列,但仅适用于每个匹配年份(例如,表 1 中只有 2011 年和 2018 年的数据,因此应该只匹配这些年份)
tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",~"gdp",
"Central and Southern Asia", "Afghanistan", 2011, "female",0.186, 550,
"Central and Southern Asia","Afghanistan", 2011, "male",0.454,550,
"Central and Southern Asia", "Afghanistan", 2011, "total",0.274,550,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,590,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504, 590,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367, 590)
Thanks for your help,谢谢你的帮助,
dplyr
's join verbs' by=
argument can accept more than one column: dplyr
's join verbs' by=
参数可以接受多于一列:
table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
"Central and Southern Asia", "Afghanistan", 2011, "female", 0.186,
"Central and Southern Asia","Afghanistan", 2011, "male", 0.454,
"Central and Southern Asia", "Afghanistan", 2011, "total", 0.274,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367)
table_2 <- tribble(~"Country", ~"gdp", ~"Year",
"Afghanistan", 551., 2010,
"Afghanistan", 599.,2011,
"Afghanistan", 649., 2012,
"Afghanistan", 648., 2013,
"Afghanistan", 625., 2014,
"Afghanistan", 590., 2015,
"Afghanistan", 550., 2016,
"Afghanistan", 550., 2017)
table_1 %>% left_join(table_2, by = c("Country", "Year"))
# # A tibble: 6 x 6
# Region Country Year Gender median_rate gdp
# <chr> <chr> <dbl> <chr> <dbl> <dbl>
# 1 Central and Southern Asia Afghanistan 2011 female 0.186 599
# 2 Central and Southern Asia Afghanistan 2011 male 0.454 599
# 3 Central and Southern Asia Afghanistan 2011 total 0.274 599
# 4 Central and Southern Asia Afghanistan 2018 female 0.221 NA
# 5 Central and Southern Asia Afghanistan 2018 male 0.504 NA
# 6 Central and Southern Asia Afghanistan 2018 total 0.367 NA
We can also use merge
from base R
我们也可以从
base R
使用merge
merge(table_1, table_2, by = c("Country", "Year"), all.x = TRUE)
# Country Year Region Gender median_rate gdp
#1 Afghanistan 2011 Central and Southern Asia female 0.186 599
#2 Afghanistan 2011 Central and Southern Asia male 0.454 599
#3 Afghanistan 2011 Central and Southern Asia total 0.274 599
#4 Afghanistan 2018 Central and Southern Asia female 0.221 NA
#5 Afghanistan 2018 Central and Southern Asia male 0.504 NA
#6 Afghanistan 2018 Central and Southern Asia total 0.367 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.