[英]R - gsub to remove punctuation & numbers from string
I'm trying to remove punctuation and digits from <U+200B>Chandler
to become Chandler
. 我正在尝试从
<U+200B>Chandler
删除标点符号和数字以成为Chandler
。 This is what I'm currently trying: 这是我目前正在尝试的方法:
df$city <- gsub("[[:punct:]]|[[:digit:]]", "", df$city)
However, it doesn't do anything to change the cell in column 'city' in 'df'. 但是,它不会改变“ df”中“ city”列中的单元格。 When I search typeof(df), I get 'list'.
当我搜索typeof(df)时,我得到“列表”。 This might have to do with it?
这可能与它有关吗?
Any help would be greatly appreciated. 任何帮助将不胜感激。
Second question first, tyepof()
will always return list
for a data frame, because data frames are really just lists of equal length vectors . 首先要问的第二个问题是,
tyepof()
将始终返回数据帧的list
,因为数据帧实际上只是等长向量的列表 。
For the first question, it appears you have some Unicode encoded characters in your data. 对于第一个问题,似乎您的数据中包含一些Unicode编码的字符。 One good way to take care of these is to convert them, perhaps like:
照顾这些的一种好方法是将它们转换,例如:
df$city <- iconv(df$city, 'utf-8', 'ascii', sub = '')
It is also possible to gsub
out characters on their hex code, like this: 也可以在其十六进制代码中
gsub
出字符,如下所示:
df$city <- gsub('\u200B', '', df$city)
or even a range: 甚至范围:
df$city <- gsub('[\u2000-\u20ff]', '', df$city)
But really I think the iconv
approach is the way to go. 但实际上我认为
iconv
方法是iconv
方法。 In this usage it will just remove the character rather than render it, but that seems to be what you want. 在这种用法中,它只会删除字符而不是渲染它,但这似乎就是您想要的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.