[英]R programming - How to remove special characters from a data set?
I have a data set that contains strings and special characters like the one below can be found in the data set. 我有一个包含字符串和特殊字符的数据集,例如可以在数据集中找到以下字符。
How do I remove special characters like the above from my data set? 如何从数据集中删除上述特殊字符?
Use regular expressions to remove unwanted characters, for example: 使用正则表达式删除不需要的字符,例如:
dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)
to remove everything except word characters and spaces. 删除除单词字符和空格以外的所有内容。 To do more complex replacements look into the help topic ?regexp
. 要进行更复杂的替换,请查看帮助主题?regexp
。
Also look into the encoding ( Encoding
and iconv
are helpful here.), maybe the text is correct but the wrong encoding is assumed. 还要查看编码(此处的Encoding
和iconv
很有用。),也许文本是正确的,但假定编码错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.