python unicode 用空字符串替换反斜杠 u

Question

I'm sanitizing a pandas dataframe and encounters unicode string that has a u inside it with a backslash than I need to replace eg我正在清理 Pandas 数据框并遇到 unicode 字符串，里面有一个带有反斜杠的u ，而不是我需要替换的，例如

u'\u2014'.replace('\u','')
Result: u'\u2014'

I've tried encoding it as utf-8 then decoding it but that didn't work and I feel there must be an easier way around this.我尝试将其编码为utf-8然后对其进行解码，但这没有用，我觉得必须有更简单的方法来解决这个问题。

pandas code熊猫代码

merged['Rank World Bank'] = merged['Rank World Bank'].astype(str)

Error错误

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 0: ordinal not in range(128)

Answer 1

u'\—' is actually - . u'\—'实际上是- 。 It's not a number.这不是一个数字。 It's a utf-8 character.这是一个utf-8字符。 Try using print keyword to print it .尝试使用 print 关键字来打印它。 You will know你会知道

This is the output in ipython:这是 ipython 中的输出：

In [4]: print("val = ", u'\u2014')
val =  —

Based on your comment, here is what you are doing wrong "-" is not same as "EM Dash" Unicode character(u'\—')根据您的评论，这是您做错的地方“-”与“EM Dash”Unicode 字符不同（u'\—'）

So, you should do the following所以，你应该做到以下几点

print(u'\u2014'.replace("\u2014",""))

and that will work这会起作用

EDIT: since you are using python 2.x, you have to encode it with utf-8 as follows编辑：由于您使用的是 python 2.x，您必须使用 utf-8 对其进行编码，如下所示

u'\u2014'.encode('utf-8').decode('utf-8').replace("-","")

Answer 2

Yeah, Because it is taking '2014' followed by '\\u\u0026#39; as a unicode string and not a string literal.是的，因为它将 '2014' 后跟 '\\u\u0026#39; 作为 unicode 字符串而不是字符串文字。

Things that can help:可以提供帮助的事情：

Converting to ascii using .encode('ascii', 'ignore')使用 .encode('ascii', 'ignore') 转换为 ascii
As you are using pandas, you can use 'encoding' parameter and pass 'ascii' there.当您使用 Pandas 时，您可以使用 'encoding' 参数并在那里传递 'ascii'。
Do this instead : u'\—'.replace(u'\—', u'2014').encode('ascii', 'ignore')改为这样做： u'\—'.replace(u'\—', u'2014').encode('ascii', 'ignore')

Hope this helps.希望这可以帮助。

python unicode 用空字符串替换反斜杠 u

问题描述

pandas code熊猫代码

2 个解决方案

解决方案1
3 已采纳 2018-07-17 12:30:01

解决方案2
0 2018-07-17 15:40:33

python unicode 用空字符串替换反斜杠 u

问题描述

pandas code熊猫代码

2 个解决方案

解决方案1 3 已采纳 2018-07-17 12:30:01

解决方案2 0 2018-07-17 15:40:33

解决方案1
3 已采纳 2018-07-17 12:30:01

解决方案2
0 2018-07-17 15:40:33