简体   繁体   English

如何根据字符串的匹配部分合并 R 中的两个数据帧?

[英]How to merge two dataframes in R based on a matching part of a string?

I have two data frames, one with economic information for various countries and the other with the proper names of the countries.我有两个数据框,一个是各个国家的经济信息,另一个是国家的专有名称。 The two data frames look like this:两个数据框如下所示:

country <- c("Afghanistan", "Afghanistan", "United States", "United States", "Congo, Dem. Rep.", "Congo, Dem. Rep.", "Middle East and North Africa", "Middle East and North Africa")
years <- c(2011, 2012, 2011, 2012, 2011, 2012, 2011, 2012)
gdp <- c(123, 442, 9451, 9999, 351, 664, 7531, 6634)
economic_data <- cbind.data.frame(country, years, gdp)

country_proper <- c("Afghanistan", "United States of America", "Congo DR")

I want to change the names of the countries in economic_data to their proper names in the country_proper data, and then drop the countries in economic_data which do not appear in country_proper (like "Middle East and North Africa").我想将经济数据中的国家名称更改为国家正确数据中的专有名称,然后删除经济数据中未出现在国家数据中的国家(如“中东和北非”)。

You need to use fuzzy matching.您需要使用模糊匹配。 Try this -试试这个 -

country <- c("Afghanistan", "Afghanistan", "United States", "United States", "Congo, Dem. Rep.", "Congo, Dem. Rep.", "Middle East and North Africa", "Middle East and North Africa")
years <- c(2011, 2012, 2011, 2012, 2011, 2012, 2011, 2012)
gdp <- c(123, 442, 9451, 9999, 351, 664, 7531, 6634)
economic_data <- data.frame(country, years, gdp, stringsAsFactors = F)

country_proper <- c("Afghanistan", "United States of America", "Congo DR")
country_proper <- data.frame(country = country_proper, stringsAsFactors = F)

library(fuzzyjoin)
stringdist_join(economic_data, 
                country_proper,
                method = c("soundex"),
                mode = "inner",
                by = "country") 

Here is an alternative way: We could use str_replace_all from stringr package:这是另一种方法:我们可以使用stringr包中的str_replace_all

library(dplyr)
library(stringr)
economic_data %>% 
    mutate(country = str_replace_all(country, c(
        "^United States$" = "United States of America",
        "^Congo, Dem. Rep.$" = "Congo DR")))

data:数据:

                       country years  gdp
1                  Afghanistan  2011  123
2                  Afghanistan  2012  442
3     United States of America  2011 9451
4     United States of America  2012 9999
5                     Congo DR  2011  351
6                     Congo DR  2012  664
7 Middle East and North Africa  2011 7531
8 Middle East and North Africa  2012 6634

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM