I have a data set that contains strings and special characters like the one below can be found in the data set.
How do I remove special characters like the above from my data set?
Use regular expressions to remove unwanted characters, for example:
dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)
to remove everything except word characters and spaces. To do more complex replacements look into the help topic ?regexp
.
Also look into the encoding ( Encoding
and iconv
are helpful here.), maybe the text is correct but the wrong encoding is assumed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.