参数“模式”的长度 > 1，并且只会使用第一个元素 - GSUB()

Question

我有以下问题。

table <- data.frame(col1 = c("cars1 gm", "cars2 gl"), col2 = c("cars1 motor mel", "cars2 prom del"))

      col1            col2
1 cars1 gm cars1 motor mel
2 cars2 gl  cars2 prom del

table$word <- gsub(table$col1, ' ', table$col2) 

Warning message:  In gsub(table$col1, " ", table$col2) :  argument
'pattern' has length > 1 and only the first element will be used

如何创建一个名为word的新列，该列仅包含col2未出现在col1那些值？

      col1            col2       word
1 cars1 gm cars1 motor mel  motor mel
2 cars2 gl  cars2 prom del   prom del

Answer 1

您可以使用gsub来构建您的查找，然后在列上进行sapply以执行感兴趣的gsub ：

table$col1 <- gsub(" ", "|", table$col1)
table$word <- sapply(1:nrow(table), function(x) gsub(table$col1[x], "", table$col2[x]))

table
#      col1            col2       word
#1 cars1|gm cars1 motor mel  motor mel
#2 cars2|gl  cars2 prom del   prom del

使用与上述答案类似的想法，但使用mapply而不是sapply ：

table$word <- mapply(function(x, y) gsub( gsub(" ", "|", x), "", y),
                                    table$col1,
                                    table$col2)

Answer 2

您可以使用mapply ，

#Make sure you read your data with stringsAsFactors = FALSE, 
table<-data.frame(col1=c("cars1 gm","cars2 gl"),
                  col2=c("cars1 motor mel", "cars2 prom del"), stringsAsFactors = FALSE)

table$word <- mapply(function(x, y) 
                     trimws(gsub(sapply(strsplit(x, ' '), paste, collapse = '|'), '', y)), 
                     table$col1, table$col2)
table
#      col1            col2      word
#1 cars1 gm cars1 motor mel motor mel
#2 cars2 gl  cars2 prom del  prom del

Answer 3

您可以像这样使用mapply 、 paste和strsplit 。

table$word <- mapply(function(x, y) paste(y[!(y %in% x)], collapse=" "),
                     strsplit(as.character(table$col1), split=" "),
                     strsplit(as.character(table$col2), split=" "))

这里， strsplit在 " " 上拆分字符向量并返回一个列表。 这两个列表被提供给mapply ，它检查每个列表的相应值并返回第二个列表中不在第一个列表中的值。 结果向量与paste及其折叠参数粘贴在一起。

返回

table
      col1            col2      word
1 cars1 gm cars1 motor mel motor mel
2 cars2 gl  cars2 prom del  prom del

Answer 4

您可以拆分col1和col2的字符串，因为单词的顺序可能不同，然后您可以使用setdiff选择仅出现在col2中的setdiff ：

table$word=sapply(1:nrow(table),function(i)
paste(setdiff(unlist(strsplit(table$col2[i]," ")),
unlist(strsplit(table$col1[i]," "))),collapse=" "))

这将返回：

col1            col2      word
1 cars1 gm cars1 motor mel motor mel
2 cars2 gl  cars2 prom del  prom del

参数“模式”的长度 > 1，并且只会使用第一个元素 - GSUB()

问题描述

4 个解决方案

解决方案1
5 已采纳 2017-05-23 14:46:00

解决方案2
2 2017-05-23 14:46:00

解决方案3
1 2017-05-23 14:46:03

解决方案4
0 2017-05-23 14:44:24

参数“模式”的长度 &gt; 1，并且只会使用第一个元素 - GSUB()

问题描述

4 个解决方案

解决方案1 5 已采纳 2017-05-23 14:46:00

解决方案2 2 2017-05-23 14:46:00

解决方案3 1 2017-05-23 14:46:03

解决方案4 0 2017-05-23 14:44:24

参数“模式”的长度 > 1，并且只会使用第一个元素 - GSUB()

解决方案1
5 已采纳 2017-05-23 14:46:00

解决方案2
2 2017-05-23 14:46:00

解决方案3
1 2017-05-23 14:46:03

解决方案4
0 2017-05-23 14:44:24