从数据框中删除特殊字符

Question

I have a matrix that contains the string "Energy per �m".我有一个包含字符串“Energy per �m”的矩阵。 Before the 'm' is a diamond shaped symbol with a question mark in it - I don't know what it is.在“m”之前是一个菱形符号，里面有一个问号——我不知道它是什么。

I have tried to get rid of it by using this on the column of the matrix:我试图通过在矩阵的列上使用它来摆脱它：

a=gsub('Energy per �m','',a)

[and using copy/paste for the first term of gsub], but it does not work.[unexpected symbol in "a=rep(5,Energy per"]. When I try to extract something from the original matrix with grepl I get: [并为 gsub 的第一项使用复制/粘贴]，但它不起作用。[“a=rep(5,Energy per”) 中的意外符号]。当我尝试使用 grepl 从原始矩阵中提取某些内容时，我得到:

46: In grepl("ref. value", raw$parameter) :
input string 15318 is invalid in this locale

How can I get rid of all this sort of signs?我怎样才能摆脱所有这些迹象？ I would like to have only 0-9, AZ, az, / and '.我只想有 0-9、AZ、az、/ 和 '. The rest can be zapped.可以拨打 rest。

Answer 1

There is probably a better way to do this than with regex (eg by changing the Encoding ).可能有比使用正则表达式更好的方法（例如通过更改Encoding ）。

But here is your regex solution:但这是您的正则表达式解决方案：

gsub("[^0-9A-Za-z///' ]", "", a)
[1] "Energy per m"

But, as pointed out by @JoshuaUlrich, you're better off to use:但是，正如@JoshuaUlrich 所指出的，您最好使用：

gsub("[^[:alnum:]///' ]", "", x)
[1] "Energy per m"

Answer 2

str_replace_all() is an option if you prefer to use the stringr package:如果您更喜欢使用stringr package， str_replace_all()是一个选项：

library(stringr)

x <- 'Energy per �m'

str_replace_all(x, "[^[:alnum:]///' ]", "")
[1] "Energy per m"

从数据框中删除特殊字符

问题描述

2 个解决方案

解决方案1
23 已采纳 2012-08-15 14:25:36

解决方案2
0 2022-06-06 21:11:43

从数据框中删除特殊字符

问题描述

2 个解决方案

解决方案1 23 已采纳 2012-08-15 14:25:36

解决方案2 0 2022-06-06 21:11:43

解决方案1
23 已采纳 2012-08-15 14:25:36

解决方案2
0 2022-06-06 21:11:43