如何将 ID 与 R 中不同数据框中的名称匹配

Question

I have two dataframes with IDs from the RePEc database.我有两个带有来自 RePEc 数据库的 ID 的数据框。

In one dataframe, I have about 3,000 IDs and a list of numerical values.在一个数据框中，我有大约 3,000 个 ID 和一个数值列表。 It looks like this:它看起来像这样：

df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)

I then have another dataframe with many more IDs (about 150,000) including the IDs from df1 and author names.然后我有另一个包含更多 ID（大约 150,000）的数据框，包括来自 df1 和作者姓名的 ID。 It looks like this:它看起来像这样：

df2$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872", "RePEc:sgc:wpaper:id:2926")
df2$authors <- c("Smith, John; Hope, Gill", "Robinson, Jill", "Chu, James", "Ravendran, Vikram")
df2$year <- c("2019", "2020", "2018", "2017)

I want to pull the authors last names and the year of publication associated with each ID into df1 and create a new column new_IDs that are there so the final output looks like this:我想将与每个 ID 关联的作者姓氏和出版年份拉入 df1 并创建一个新列new_IDs在那里，因此最终输出如下所示：

df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)
df1$new_id <- c("Smith and Hope 2019", "Robinson 2020", "Chu 2018")

Does anyone know how I can do this?有谁知道我怎么能做到这一点？ Thank you in advance for your help!预先感谢您的帮助！

Answer 1

We can do a left_join and then extract the words before the , and paste with 'year'我们可以做一个left_join ，然后提取之前的单词,然后粘贴 'year'

library(dplyr)
library(purrr)
library(stringr)
df1 %>%
  left_join(df2) %>% 
  mutate(new_id = str_c(map(str_extract_all(authors, "\\w+(?=,)"), 
          str_c, collapse = ' and '), year, sep=" ")) %>%
  select(-authors, -year)
#                                     repec_id numbers              new_id
#1   RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68     1.4 Smith and Hope 2019
#2 RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122     3.5       Robinson 2020
#3                    RePEc:ess:wpaper:id:6872     4.9            Chu 2018

Or instead of extracting, we can remove with str_remove或者我们可以使用str_remove删除而不是提取

df1 %>% 
    left_join(df2) %>%
    transmute(repec_id, numbers, 
       new_id = str_c(str_remove_all(authors, ',\\s*\\w+(?:;|$)'), year, sep=' '))

如何将 ID 与 R 中不同数据框中的名称匹配

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-07 23:30:24

如何将 ID 与 R 中不同数据框中的名称匹配

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-07 23:30:24

解决方案1
1 已采纳 2020-02-07 23:30:24