简体   繁体   English

R编程-如何从数据集中删除特殊字符?

[英]R programming - How to remove special characters from a data set?

I have a data set that contains strings and special characters like the one below can be found in the data set. 我有一个包含字符串和特殊字符的数据集,例如可以在数据集中找到以下字符。

Special character 特殊字符

How do I remove special characters like the above from my data set? 如何从数据集中删除上述特殊字符?

Use regular expressions to remove unwanted characters, for example: 使用正则表达式删除不需要的字符,例如:

dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)

to remove everything except word characters and spaces. 删除除单词字符和空格以外的所有内容。 To do more complex replacements look into the help topic ?regexp . 要进行更复杂的替换,请查看帮助主题?regexp

Also look into the encoding ( Encoding and iconv are helpful here.), maybe the text is correct but the wrong encoding is assumed. 还要查看编码(此处的Encodingiconv很有用。),也许文本是正确的,但假定编码错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM