简体   繁体   中英

How to replace '+' using gsub() function in R

I'm trying to remove the '+' character present inside one of the string element of a data frame. But I'm not able to find a way out of it.

Below is data frame.

txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L, 
            5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment", 
            "poli+tician", "politician"), class = "factor")), .Names = c("ID", 
            "Var1"), class = "data.frame", row.names = c(NA, -9L))
#  ID   Var1
#  1    government
#  2    government
#  3    government
#  4    government
#  5    poli+tician
#  6    politician
#  7    politician
#  8    parliament
#  9    parliment

I tried two ways, neither of them gave the expected results:

Way1

txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia"  "oliiia" 
# [8] "arliame" "arlime" 

I don't understand what's wrong here. I want the '+' characters to be replaced with no value for the 5th element alone, but all the elements are edited as above.

Way2

txtdf<-gsub("*//+","",txtdf)
# [1] "government"  "government"  "government"  "government"  "poli+tician"
# [6] "politician"  "politician"  "parliament"  "parliment" 

Here there is no change at all. What I think I've tried is, i tried to escape the + character using double slashes.

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives

          job
1  government
2 poli+tician
3  parliament

Now replace the "+":

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is:

         job
1 government
2 politician
3 parliament

You need to escape your plus sign, "+" has a special meaning(it is a quantifier) when it comes to regex and hence can't be treated as a punctuation mark, From documentation: ?regex

"+" The preceding item will be matched one or more times.

To match these special characters you need to escape these so that their meaning could be taken literally and hence their special meaning doesn't get translated. In R you need two backslashes(\\) to escape. So in your case this would be something like:

gsub("\\+","",df$job)

Running above will give you the desired result by removing all the plus symbols from your data.

So assuming your df is :

df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))

then your output will be :

> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM