简体   繁体   中英

R programming - How to remove special characters from a data set?

I have a data set that contains strings and special characters like the one below can be found in the data set.

Special character

How do I remove special characters like the above from my data set?

Use regular expressions to remove unwanted characters, for example:

dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)

to remove everything except word characters and spaces. To do more complex replacements look into the help topic ?regexp .

Also look into the encoding ( Encoding and iconv are helpful here.), maybe the text is correct but the wrong encoding is assumed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM