如何使用R中的gsub（）函数替换“+”

Question

I'm trying to remove the '+' character present inside one of the string element of a data frame. 我正在尝试删除数据框的一个字符串元素中存在的“+”字符。 But I'm not able to find a way out of it. 但我无法找到解决方法。

Below is data frame. 下面是数据框。

txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L, 
            5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment", 
            "poli+tician", "politician"), class = "factor")), .Names = c("ID", 
            "Var1"), class = "data.frame", row.names = c(NA, -9L))
#  ID   Var1
#  1    government
#  2    government
#  3    government
#  4    government
#  5    poli+tician
#  6    politician
#  7    politician
#  8    parliament
#  9    parliment

I tried two ways, neither of them gave the expected results: 我尝试了两种方法，它们都没有给出预期的结果：

Way1 WAY1

txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia"  "oliiia" 
# [8] "arliame" "arlime"

I don't understand what's wrong here. 我不明白这里有什么问题。 I want the '+' characters to be replaced with no value for the 5th element alone, but all the elements are edited as above. 我希望'+'字符单独替换为第5个元素没有值，但所有元素都按上面的方式编辑。

Way2 Way2

txtdf<-gsub("*//+","",txtdf)
# [1] "government"  "government"  "government"  "government"  "poli+tician"
# [6] "politician"  "politician"  "parliament"  "parliment"

Here there is no change at all. 这里根本没有变化。 What I think I've tried is, i tried to escape the + character using double slashes. 我想我试过的是，我试图用双斜线逃避+字符。

Answer 1

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name: 只需将其替换为fixed = TRUE （不需要使用正则表达式），但您必须通过指定列名来替换data.frame的每个“列”：

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives 给

          job
1  government
2 poli+tician
3  parliament

Now replace the "+": 现在替换“+”：

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is: 结果是：

         job
1 government
2 politician
3 parliament

Answer 2

You need to escape your plus sign, "+" has a special meaning(it is a quantifier) when it comes to regex and hence can't be treated as a punctuation mark, From documentation: ?regex 你需要逃避你的加号，“+”具有特殊意义（它是量词），当涉及到正则表达式，因此不能被视为标点符号，来自文档： ?regex

"+" The preceding item will be matched one or more times. “+”前一项将匹配一次或多次。

To match these special characters you need to escape these so that their meaning could be taken literally and hence their special meaning doesn't get translated. 为了匹配这些特殊字符，你需要逃避它们，以便它们的含义可以从字面上理解，因此它们的特殊含义不会被翻译。 In R you need two backslashes(\\) to escape. 在R中你需要两个反斜杠（\\）才能逃脱。 So in your case this would be something like: 所以在你的情况下，这将是这样的：

gsub("\\+","",df$job)

Running above will give you the desired result by removing all the plus symbols from your data. 通过从数据中删除所有加号，上面运行将为您提供所需的结果。

So assuming your df is : 所以假设你的df是：

df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))

then your output will be : 然后你的输出将是：

> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"

如何使用R中的gsub（）函数替换“+”

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-05-14 16:04:29

解决方案2
1 2017-05-14 16:39:57

如何使用R中的gsub（）函数替换“+”

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-05-14 16:04:29

解决方案2 1 2017-05-14 16:39:57

解决方案1
3 已采纳 2017-05-14 16:04:29

解决方案2
1 2017-05-14 16:39:57