简体   繁体   English

通过部分字符串匹配将列值从一个数据帧添加到另一个数据帧

[英]Add a column value from one dataframe to another by partial string matching

I have two dataframe, DF1 is a very big dataframe with millions of rows that contain among other columns a speaker that can be repeated with some variation.我有两个数据框,DF1 是一个非常大的数据框,有数百万行,其中包含一个扬声器,可以重复一些变化。

DF1 <- (speaker = c(Mr. Kirkwood, Mr Churchill, the joint under secretary of state (John Rosewood), William Cove, Winston Churchill, Mr. Archie Kirkwood), col2 = whatever, col3 = whatever))

 speaker        column2   column3
1 Mr. Kirkwood
2 Mr Churchill
3 The joint under secretary of state (John Rosewood)
4 William Cove
5 Winston Churchill
6 Mr. Archie Kirkwood

DF2 contains three columns: firstname, lastname, and party. DF2 包含三列:名字、姓氏和聚会。

DF2 <- (firstname = c(Archie, Winston, John, William), lastname = c(Kirkwood, Churchill, Rosewood, Cove), party = c(Labour, Conservative, Conservative, Labour)) 

I want to add the party column to DF1 so that it matches the firstname and surname from DF2.我想将派对列添加到 DF1,以便它与 DF2 中的名字和姓氏相匹配。 The final dataframe should look like this:最终的数据框应如下所示:

 speaker        column2   column3   party
1 Mr. Kirkwood                      labour
2 Mr Churchill                      conservative
3 The joint under secretary of state (John Rosewood) conservative
4 William Cove                      labour
5 Winston Churchill                 conservative
6 Mr. Archie Kirkwood               labour

I have tried using a for loop using grepl but it takes a very long time我尝试使用 grepl 使用 for 循环,但需要很长时间

for (i in 1:nrow(DF1)){
  for (j in 1:nrow(DF2)){
    if(grepl(DF2$lastname[j], DF1$speaker[i])){
      if(grepl(DF2$firstname[j], DF1$speaker[i])){
        DF1$party[i] <- DF2$party[i]
      }
      else(DF1$party[i] <- "missing first name")
    }
  }
}

And I wonder if there is a quicker and smarter way of doing it?我想知道是否有更快更聪明的方法? Thanks.谢谢。

Using ifelse with grepl ,ifelsegrepl一起使用,

DF1 <- data.frame(
  speaker = c('Mr. Kirkwood', 'Mr Churchill', 'the joint under secretary of state (John Rosewood)', 'William Cove', 'Winston Churchill', 'Mr. Archie Kirkwood'))

DF2 <- data.frame(
  firstname = c('Archie', 'Winston', 'John', 'William'), 
  lastname = c('Kirkwood', 'Churchill', 'Rosewood', 'Cove'), 
  party = c('Labour', 'Conservative', 'Conservative', 'Labour')) 

conservative <- paste(c('Churchill','Rosewood'), collapse = '|')
labour <- paste(c('Cove','Kirkwood'),collapse='|')

conservative and labour are patterns for mutating the party column in DF1 . conservativelabourDF1中改变party列的模式。

DF1$party <- ifelse(grepl(conservative, DF1$speaker),'conservative','labour')
DF1
#>                                              speaker        party
#> 1                                       Mr. Kirkwood       labour
#> 2                                       Mr Churchill conservative
#> 3 the joint under secretary of state (John Rosewood) conservative
#> 4                                       William Cove       labour
#> 5                                  Winston Churchill conservative
#> 6                                Mr. Archie Kirkwood       labour

Created on 2022-05-16 by the reprex package (v2.0.1)reprex 包(v2.0.1) 创建于 2022-05-16

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用部分字符串匹配从另一个数据帧填充一个数据帧 - Filling in a dataframe from another dataframe using partial string matching 根据来自 R 中另一列值的部分字符串匹配查找数据帧的子集 - Find a subset of dataframe based on partial string matching from another column of values in R 在匹配列不按相同顺序排列时,基于部分字符串匹配合并来自另一个数据帧的值 - To merge values from another dataframe based on partial string match while the matching column is not in same order 如何通过匹配R中的一列或另一列从另一数据帧添加一列? - How to add a column from another dataframe by matching one column or another column in R? 通过匹配来自另一个 dataframe 的 id 添加一列 - Add a column by matching id from another dataframe 根据匹配的值,使用另一个数据框中的值更新一个数据框中的列 - updating column in one dataframe with value from another dataframe based on matching values R:如何将一个数据帧中的部分字符串与另一个数据帧进行匹配,并根据条件分配值? - R: how to match partial string in one dataframe from another, and based on condition, assign a value? 根据匹配列将行从一个数据框移动到另一个数据框 - Moving rows from one dataframe to another based on a matching column 基于匹配其他列的部分字符串在数据框中创建新列 - Create new column in dataframe based on partial string matching other column 将一个变量的多个值与一个值匹配到另一个 dataframe - Matching multiple values for a variable to one value to from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM