简体   繁体   中英

Add a column value from one dataframe to another by partial string matching

I have two dataframe, DF1 is a very big dataframe with millions of rows that contain among other columns a speaker that can be repeated with some variation.

DF1 <- (speaker = c(Mr. Kirkwood, Mr Churchill, the joint under secretary of state (John Rosewood), William Cove, Winston Churchill, Mr. Archie Kirkwood), col2 = whatever, col3 = whatever))

 speaker        column2   column3
1 Mr. Kirkwood
2 Mr Churchill
3 The joint under secretary of state (John Rosewood)
4 William Cove
5 Winston Churchill
6 Mr. Archie Kirkwood

DF2 contains three columns: firstname, lastname, and party.

DF2 <- (firstname = c(Archie, Winston, John, William), lastname = c(Kirkwood, Churchill, Rosewood, Cove), party = c(Labour, Conservative, Conservative, Labour)) 

I want to add the party column to DF1 so that it matches the firstname and surname from DF2. The final dataframe should look like this:

 speaker        column2   column3   party
1 Mr. Kirkwood                      labour
2 Mr Churchill                      conservative
3 The joint under secretary of state (John Rosewood) conservative
4 William Cove                      labour
5 Winston Churchill                 conservative
6 Mr. Archie Kirkwood               labour

I have tried using a for loop using grepl but it takes a very long time

for (i in 1:nrow(DF1)){
  for (j in 1:nrow(DF2)){
    if(grepl(DF2$lastname[j], DF1$speaker[i])){
      if(grepl(DF2$firstname[j], DF1$speaker[i])){
        DF1$party[i] <- DF2$party[i]
      }
      else(DF1$party[i] <- "missing first name")
    }
  }
}

And I wonder if there is a quicker and smarter way of doing it? Thanks.

Using ifelse with grepl ,

DF1 <- data.frame(
  speaker = c('Mr. Kirkwood', 'Mr Churchill', 'the joint under secretary of state (John Rosewood)', 'William Cove', 'Winston Churchill', 'Mr. Archie Kirkwood'))

DF2 <- data.frame(
  firstname = c('Archie', 'Winston', 'John', 'William'), 
  lastname = c('Kirkwood', 'Churchill', 'Rosewood', 'Cove'), 
  party = c('Labour', 'Conservative', 'Conservative', 'Labour')) 

conservative <- paste(c('Churchill','Rosewood'), collapse = '|')
labour <- paste(c('Cove','Kirkwood'),collapse='|')

conservative and labour are patterns for mutating the party column in DF1 .

DF1$party <- ifelse(grepl(conservative, DF1$speaker),'conservative','labour')
DF1
#>                                              speaker        party
#> 1                                       Mr. Kirkwood       labour
#> 2                                       Mr Churchill conservative
#> 3 the joint under secretary of state (John Rosewood) conservative
#> 4                                       William Cove       labour
#> 5                                  Winston Churchill conservative
#> 6                                Mr. Archie Kirkwood       labour

Created on 2022-05-16 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM