I'm trying to remove the '+' character present inside one of the string element of a data frame. But I'm not able to find a way out of it.
Below is data frame.
txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L,
5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment",
"poli+tician", "politician"), class = "factor")), .Names = c("ID",
"Var1"), class = "data.frame", row.names = c(NA, -9L))
# ID Var1
# 1 government
# 2 government
# 3 government
# 4 government
# 5 poli+tician
# 6 politician
# 7 politician
# 8 parliament
# 9 parliment
I tried two ways, neither of them gave the expected results:
Way1
txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia" "oliiia"
# [8] "arliame" "arlime"
I don't understand what's wrong here. I want the '+' characters to be replaced with no value for the 5th element alone, but all the elements are edited as above.
Way2
txtdf<-gsub("*//+","",txtdf)
# [1] "government" "government" "government" "government" "poli+tician"
# [6] "politician" "politician" "parliament" "parliment"
Here there is no change at all. What I think I've tried is, i tried to escape the + character using double slashes.
Simply replace it with fixed = TRUE
(no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:
txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf
gives
job
1 government
2 poli+tician
3 parliament
Now replace the "+":
txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf
The result is:
job
1 government
2 politician
3 parliament
You need to escape your plus sign, "+" has a special meaning(it is a quantifier) when it comes to regex and hence can't be treated as a punctuation mark, From documentation: ?regex
"+" The preceding item will be matched one or more times.
To match these special characters you need to escape these so that their meaning could be taken literally and hence their special meaning doesn't get translated. In R you need two backslashes(\\) to escape. So in your case this would be something like:
gsub("\\+","",df$job)
Running above will give you the desired result by removing all the plus symbols from your data.
So assuming your df is :
df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))
then your output will be :
> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.