[英]How to match IDs with names in a different dataframe in R
I have two dataframes with IDs from the RePEc database.我有两个带有来自 RePEc 数据库的 ID 的数据框。
In one dataframe, I have about 3,000 IDs and a list of numerical values.在一个数据框中,我有大约 3,000 个 ID 和一个数值列表。 It looks like this:它看起来像这样:
df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)
I then have another dataframe with many more IDs (about 150,000) including the IDs from df1 and author names.然后我有另一个包含更多 ID(大约 150,000)的数据框,包括来自 df1 和作者姓名的 ID。 It looks like this:它看起来像这样:
df2$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872", "RePEc:sgc:wpaper:id:2926")
df2$authors <- c("Smith, John; Hope, Gill", "Robinson, Jill", "Chu, James", "Ravendran, Vikram")
df2$year <- c("2019", "2020", "2018", "2017)
I want to pull the authors last names and the year of publication associated with each ID into df1 and create a new column new_IDs
that are there so the final output looks like this:我想将与每个 ID 关联的作者姓氏和出版年份拉入 df1 并创建一个新列new_IDs
在那里,因此最终输出如下所示:
df1$repec_id <- c("RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68", "RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122", "RePEc:ess:wpaper:id:6872")
df1$numbers <- c("1.4", "3.5", "4.9)
df1$new_id <- c("Smith and Hope 2019", "Robinson 2020", "Chu 2018")
Does anyone know how I can do this?有谁知道我怎么能做到这一点? Thank you in advance for your help!预先感谢您的帮助!
We can do a left_join
and then extract the words before the ,
and paste with 'year'我们可以做一个left_join
,然后提取 之前的单词,
然后粘贴 'year'
library(dplyr)
library(purrr)
library(stringr)
df1 %>%
left_join(df2) %>%
mutate(new_id = str_c(map(str_extract_all(authors, "\\w+(?=,)"),
str_c, collapse = ' and '), year, sep=" ")) %>%
select(-authors, -year)
# repec_id numbers new_id
#1 RePEc:eee:ecolet:v:173:y:2018:i:c:p:65-68 1.4 Smith and Hope 2019
#2 RePEc:eee:moneco:v:103:y:2019:i:c:p:105-122 3.5 Robinson 2020
#3 RePEc:ess:wpaper:id:6872 4.9 Chu 2018
Or instead of extracting, we can remove with str_remove
或者我们可以使用str_remove
删除而不是提取
df1 %>%
left_join(df2) %>%
transmute(repec_id, numbers,
new_id = str_c(str_remove_all(authors, ',\\s*\\w+(?:;|$)'), year, sep=' '))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.