[英]Add a column value from one dataframe to another by partial string matching
I have two dataframe, DF1 is a very big dataframe with millions of rows that contain among other columns a speaker that can be repeated with some variation.我有两个数据框,DF1 是一个非常大的数据框,有数百万行,其中包含一个扬声器,可以重复一些变化。
DF1 <- (speaker = c(Mr. Kirkwood, Mr Churchill, the joint under secretary of state (John Rosewood), William Cove, Winston Churchill, Mr. Archie Kirkwood), col2 = whatever, col3 = whatever))
speaker column2 column3
1 Mr. Kirkwood
2 Mr Churchill
3 The joint under secretary of state (John Rosewood)
4 William Cove
5 Winston Churchill
6 Mr. Archie Kirkwood
DF2 contains three columns: firstname, lastname, and party. DF2 包含三列:名字、姓氏和聚会。
DF2 <- (firstname = c(Archie, Winston, John, William), lastname = c(Kirkwood, Churchill, Rosewood, Cove), party = c(Labour, Conservative, Conservative, Labour))
I want to add the party column to DF1 so that it matches the firstname and surname from DF2.我想将派对列添加到 DF1,以便它与 DF2 中的名字和姓氏相匹配。 The final dataframe should look like this:
最终的数据框应如下所示:
speaker column2 column3 party
1 Mr. Kirkwood labour
2 Mr Churchill conservative
3 The joint under secretary of state (John Rosewood) conservative
4 William Cove labour
5 Winston Churchill conservative
6 Mr. Archie Kirkwood labour
I have tried using a for loop using grepl but it takes a very long time我尝试使用 grepl 使用 for 循环,但需要很长时间
for (i in 1:nrow(DF1)){
for (j in 1:nrow(DF2)){
if(grepl(DF2$lastname[j], DF1$speaker[i])){
if(grepl(DF2$firstname[j], DF1$speaker[i])){
DF1$party[i] <- DF2$party[i]
}
else(DF1$party[i] <- "missing first name")
}
}
}
And I wonder if there is a quicker and smarter way of doing it?我想知道是否有更快更聪明的方法? Thanks.
谢谢。
Using ifelse
with grepl
,将
ifelse
与grepl
一起使用,
DF1 <- data.frame(
speaker = c('Mr. Kirkwood', 'Mr Churchill', 'the joint under secretary of state (John Rosewood)', 'William Cove', 'Winston Churchill', 'Mr. Archie Kirkwood'))
DF2 <- data.frame(
firstname = c('Archie', 'Winston', 'John', 'William'),
lastname = c('Kirkwood', 'Churchill', 'Rosewood', 'Cove'),
party = c('Labour', 'Conservative', 'Conservative', 'Labour'))
conservative <- paste(c('Churchill','Rosewood'), collapse = '|')
labour <- paste(c('Cove','Kirkwood'),collapse='|')
conservative
and labour
are patterns for mutating the party
column in DF1
. conservative
和labour
是DF1
中改变party
列的模式。
DF1$party <- ifelse(grepl(conservative, DF1$speaker),'conservative','labour')
DF1
#> speaker party
#> 1 Mr. Kirkwood labour
#> 2 Mr Churchill conservative
#> 3 the joint under secretary of state (John Rosewood) conservative
#> 4 William Cove labour
#> 5 Winston Churchill conservative
#> 6 Mr. Archie Kirkwood labour
Created on 2022-05-16 by the reprex package (v2.0.1)由reprex 包(v2.0.1) 创建于 2022-05-16
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.