简体   繁体   English

如何将 ID 与 R 中不同数据框中的名称匹配

[英]How to match IDs with names in a different dataframe in R

I have two dataframes with IDs from the RePEc database.我有两个带有来自 RePEc 数据库的 ID 的数据框。

In one dataframe, I have about 3,000 IDs and a list of numerical values.在一个数据框中,我有大约 3,000 个 ID 和一个数值列表。 It looks like this:它看起来像这样:

df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)

I then have another dataframe with many more IDs (about 150,000) including the IDs from df1 and author names.然后我有另一个包含更多 ID(大约 150,000)的数据框,包括来自 df1 和作者姓名的 ID。 It looks like this:它看起来像这样:

df2$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872", "RePEc:sgc:wpaper:id:2926")
df2$authors <- c("Smith, John; Hope, Gill", "Robinson, Jill", "Chu, James", "Ravendran, Vikram")
df2$year <- c("2019", "2020", "2018", "2017)

I want to pull the authors last names and the year of publication associated with each ID into df1 and create a new column new_IDs that are there so the final output looks like this:我想将与每个 ID 关联的作者姓氏和出版年份拉入 df1 并创建一个新列new_IDs在那里,因此最终输出如下所示:

df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)
df1$new_id <- c("Smith and Hope 2019", "Robinson 2020", "Chu 2018")

Does anyone know how I can do this?有谁知道我怎么能做到这一点? Thank you in advance for your help!预先感谢您的帮助!

We can do a left_join and then extract the words before the , and paste with 'year'我们可以做一个left_join ,然后提取 之前的单词,然后粘贴 'year'

library(dplyr)
library(purrr)
library(stringr)
df1 %>%
  left_join(df2) %>% 
  mutate(new_id = str_c(map(str_extract_all(authors, "\\w+(?=,)"), 
          str_c, collapse = ' and '), year, sep=" ")) %>%
  select(-authors, -year)
#                                     repec_id numbers              new_id
#1   RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68     1.4 Smith and Hope 2019
#2 RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122     3.5       Robinson 2020
#3                    RePEc:ess:wpaper:id:6872     4.9            Chu 2018

Or instead of extracting, we can remove with str_remove或者我们可以使用str_remove删除而不是提取

df1 %>% 
    left_join(df2) %>%
    transmute(repec_id, numbers, 
       new_id = str_c(str_remove_all(authors, ',\\s*\\w+(?:;|$)'), year, sep=' '))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM