R-gsub从字符串中删除标点符号和数字

Question

I'm trying to remove punctuation and digits from <U+200B>Chandler to become Chandler . 我正在尝试从<U+200B>Chandler删除标点符号和数字以成为Chandler 。 This is what I'm currently trying: 这是我目前正在尝试的方法：

df$city <- gsub("[[:punct:]]|[[:digit:]]", "", df$city)

However, it doesn't do anything to change the cell in column 'city' in 'df'. 但是，它不会改变“ df”中“ city”列中的单元格。 When I search typeof(df), I get 'list'. 当我搜索typeof（df）时，我得到“列表”。 This might have to do with it? 这可能与它有关吗？

Any help would be greatly appreciated. 任何帮助将不胜感激。

Answer 1

Second question first, tyepof() will always return list for a data frame, because data frames are really just lists of equal length vectors . 首先要问的第二个问题是， tyepof()将始终返回数据帧的list ，因为数据帧实际上只是等长向量的列表。

For the first question, it appears you have some Unicode encoded characters in your data. 对于第一个问题，似乎您的数据中包含一些Unicode编码的字符。 One good way to take care of these is to convert them, perhaps like: 照顾这些的一种好方法是将它们转换，例如：

df$city <- iconv(df$city, 'utf-8', 'ascii', sub = '')

It is also possible to gsub out characters on their hex code, like this: 也可以在其十六进制代码中gsub出字符，如下所示：

df$city <- gsub('\u200B', '', df$city)

or even a range: 甚至范围：

df$city <- gsub('[\u2000-\u20ff]', '', df$city)

But really I think the iconv approach is the way to go. 但实际上我认为iconv方法是iconv方法。 In this usage it will just remove the character rather than render it, but that seems to be what you want. 在这种用法中，它只会删除字符而不是渲染它，但这似乎就是您想要的。

R-gsub从字符串中删除标点符号和数字

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-02-16 03:33:45

R-gsub从字符串中删除标点符号和数字

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-02-16 03:33:45

解决方案1
0 已采纳 2019-02-16 03:33:45