I have two dataframe, DF1 is a very big dataframe with millions of rows that contain among other columns a speaker that can be repeated with some variation.
DF1 <- (speaker = c(Mr. Kirkwood, Mr Churchill, the joint under secretary of state (John Rosewood), William Cove, Winston Churchill, Mr. Archie Kirkwood), col2 = whatever, col3 = whatever))
speaker column2 column3
1 Mr. Kirkwood
2 Mr Churchill
3 The joint under secretary of state (John Rosewood)
4 William Cove
5 Winston Churchill
6 Mr. Archie Kirkwood
DF2 contains three columns: firstname, lastname, and party.
DF2 <- (firstname = c(Archie, Winston, John, William), lastname = c(Kirkwood, Churchill, Rosewood, Cove), party = c(Labour, Conservative, Conservative, Labour))
I want to add the party column to DF1 so that it matches the firstname and surname from DF2. The final dataframe should look like this:
speaker column2 column3 party
1 Mr. Kirkwood labour
2 Mr Churchill conservative
3 The joint under secretary of state (John Rosewood) conservative
4 William Cove labour
5 Winston Churchill conservative
6 Mr. Archie Kirkwood labour
I have tried using a for loop using grepl but it takes a very long time
for (i in 1:nrow(DF1)){
for (j in 1:nrow(DF2)){
if(grepl(DF2$lastname[j], DF1$speaker[i])){
if(grepl(DF2$firstname[j], DF1$speaker[i])){
DF1$party[i] <- DF2$party[i]
}
else(DF1$party[i] <- "missing first name")
}
}
}
And I wonder if there is a quicker and smarter way of doing it? Thanks.
Using ifelse
with grepl
,
DF1 <- data.frame(
speaker = c('Mr. Kirkwood', 'Mr Churchill', 'the joint under secretary of state (John Rosewood)', 'William Cove', 'Winston Churchill', 'Mr. Archie Kirkwood'))
DF2 <- data.frame(
firstname = c('Archie', 'Winston', 'John', 'William'),
lastname = c('Kirkwood', 'Churchill', 'Rosewood', 'Cove'),
party = c('Labour', 'Conservative', 'Conservative', 'Labour'))
conservative <- paste(c('Churchill','Rosewood'), collapse = '|')
labour <- paste(c('Cove','Kirkwood'),collapse='|')
conservative
and labour
are patterns for mutating the party
column in DF1
.
DF1$party <- ifelse(grepl(conservative, DF1$speaker),'conservative','labour')
DF1
#> speaker party
#> 1 Mr. Kirkwood labour
#> 2 Mr Churchill conservative
#> 3 the joint under secretary of state (John Rosewood) conservative
#> 4 William Cove labour
#> 5 Winston Churchill conservative
#> 6 Mr. Archie Kirkwood labour
Created on 2022-05-16 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.