简体   繁体   English

如何在 R 中左加入并将值重复加入多个变量?

[英]How to left_join in R and repeat joining value to multiple variables?

I'm confusing something here, either this isn't the right approach or I'm missing a part of the left_join:我在这里混淆了一些东西,要么这不是正确的方法,要么我错过了 left_join 的一部分:

I'm looking to join the "gdp" column, by country and by year, and repeat the value across all three "gender" categories, in a way that all three genders for the same year will have the same associated gdp.我希望按国家和年份加入“gdp”列,并在所有三个“性别”类别中重复该值,以使同一年的所有三个性别都具有相同的相关 gdp。

Here's what I have now:这是我现在拥有的:

library(tidyverse)

table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
 "Central and Southern Asia", "Afghanistan",  2011, "female",       0.186,
"Central and Southern Asia","Afghanistan",  2011, "male",         0.454,
 "Central and Southern Asia", "Afghanistan",  2011, "total",        0.274,
 "Central and Southern Asia", "Afghanistan",  2018, "female",       0.221,
 "Central and Southern Asia", "Afghanistan" , 2018, "male",         0.504,
 "Central and Southern Asia", "Afghanistan",  2018, "total",        0.367)

table_2 <- tribble(~"Country",    ~"gdp", ~"Year",
 "Afghanistan",  551.,  2010,
 "Afghanistan", 599.,2011,
 "Afghanistan",  649.,  2012,
 "Afghanistan",  648.,  2013,
 "Afghanistan",  625.,  2014,
 "Afghanistan",  590.,  2015,
 "Afghanistan",  550.,  2016,
 "Afghanistan",  550.,  2017)

table_1 %>% left_join(table_2, by = "Country")

# A tibble: 48 x 7
   Region                    Country     Year.x Gender median_rate   gdp Year.y
   <chr>                     <chr>        <dbl> <chr>        <dbl> <dbl>  <dbl>
 1 Central and Southern Asia Afghanistan   2011 female       0.186   551   2010
 2 Central and Southern Asia Afghanistan   2011 female       0.186   599   2011
 3 Central and Southern Asia Afghanistan   2011 female       0.186   649   2012
 4 Central and Southern Asia Afghanistan   2011 female       0.186   648   2013
 5 Central and Southern Asia Afghanistan   2011 female       0.186   625   2014
 6 Central and Southern Asia Afghanistan   2011 female       0.186   590   2015
 7 Central and Southern Asia Afghanistan   2011 female       0.186   550   2016
 8 Central and Southern Asia Afghanistan   2011 female       0.186   550   2017
 9 Central and Southern Asia Afghanistan   2011 male         0.454   551   2010
10 Central and Southern Asia Afghanistan   2011 male         0.454   599   2011
# ... with 38 more rows

Expected output would be something like this, with the gdp column from table 2 joined, but only for each matching year, (eg in table 1 there is only data from 2011 and 2018, so it should only match these years)预期的 output 会是这样的,加入表 2 中的 gdp 列,但仅适用于每个匹配年份(例如,表 1 中只有 2011 年和 2018 年的数据,因此应该只匹配这些年份)

tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",~"gdp",
        "Central and Southern Asia", "Afghanistan",  2011, "female",0.186, 550,
        "Central and Southern Asia","Afghanistan",  2011, "male",0.454,550,
        "Central and Southern Asia", "Afghanistan",  2011, "total",0.274,550,
        "Central and Southern Asia", "Afghanistan",  2018, "female", 0.221,590,
        "Central and Southern Asia", "Afghanistan" , 2018, "male",         0.504, 590,
        "Central and Southern Asia", "Afghanistan",  2018, "total",        0.367, 590)


Thanks for your help,谢谢你的帮助,

dplyr 's join verbs' by= argument can accept more than one column: dplyr 's join verbs' by=参数可以接受多于一列:

table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
 "Central and Southern Asia", "Afghanistan",  2011, "female",       0.186,
 "Central and Southern Asia","Afghanistan",  2011, "male",         0.454,
 "Central and Southern Asia", "Afghanistan",  2011, "total",        0.274,
 "Central and Southern Asia", "Afghanistan",  2018, "female",       0.221,
 "Central and Southern Asia", "Afghanistan" , 2018, "male",         0.504,
 "Central and Southern Asia", "Afghanistan",  2018, "total",        0.367)

table_2 <- tribble(~"Country",    ~"gdp", ~"Year",
 "Afghanistan",  551.,  2010,
 "Afghanistan", 599.,2011,
 "Afghanistan",  649.,  2012,
 "Afghanistan",  648.,  2013,
 "Afghanistan",  625.,  2014,
 "Afghanistan",  590.,  2015,
 "Afghanistan",  550.,  2016,
 "Afghanistan",  550.,  2017)

table_1 %>% left_join(table_2, by = c("Country", "Year"))
# # A tibble: 6 x 6
#   Region                    Country      Year Gender median_rate   gdp
#   <chr>                     <chr>       <dbl> <chr>        <dbl> <dbl>
# 1 Central and Southern Asia Afghanistan  2011 female       0.186   599
# 2 Central and Southern Asia Afghanistan  2011 male         0.454   599
# 3 Central and Southern Asia Afghanistan  2011 total        0.274   599
# 4 Central and Southern Asia Afghanistan  2018 female       0.221    NA
# 5 Central and Southern Asia Afghanistan  2018 male         0.504    NA
# 6 Central and Southern Asia Afghanistan  2018 total        0.367    NA

We can also use merge from base R我们也可以从base R使用merge

merge(table_1, table_2, by = c("Country", "Year"), all.x = TRUE)
#      Country Year                    Region Gender median_rate gdp
#1 Afghanistan 2011 Central and Southern Asia female       0.186 599
#2 Afghanistan 2011 Central and Southern Asia   male       0.454 599
#3 Afghanistan 2011 Central and Southern Asia  total       0.274 599
#4 Afghanistan 2018 Central and Southern Asia female       0.221  NA
#5 Afghanistan 2018 Central and Southern Asia   male       0.504  NA
#6 Afghanistan 2018 Central and Southern Asia  total       0.367  NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM