简体   繁体   English

python unicode 用空字符串替换反斜杠 u

[英]python unicode replace backslash u with an empty string

I'm sanitizing a pandas dataframe and encounters unicode string that has a u inside it with a backslash than I need to replace eg我正在清理 Pandas 数据框并遇到 unicode 字符串,里面有一个带有反斜杠的u ,而不是我需要替换的,例如

u'\u2014'.replace('\u','')
Result: u'\u2014'

I've tried encoding it as utf-8 then decoding it but that didn't work and I feel there must be an easier way around this.我尝试将其编码为utf-8然后对其进行解码,但这没有用,我觉得必须有更简单的方法来解决这个问题。

pandas code熊猫代码

merged['Rank World Bank'] = merged['Rank World Bank'].astype(str)

Error错误

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 0: ordinal not in range(128)

u'\—' is actually - . u'\—'实际上是- It's not a number.这不是一个数字。 It's a utf-8 character.这是一个utf-8字符。 Try using print keyword to print it .尝试使用 print 关键字来打印它。 You will know你会知道

This is the output in ipython:这是 ipython 中的输出:

In [4]: print("val = ", u'\u2014')
val =  —

Based on your comment, here is what you are doing wrong "-" is not same as "EM Dash" Unicode character(u'\—')根据您的评论,这是您做错的地方“-”与“EM Dash”Unicode 字符不同(u'\—')

So, you should do the following所以,你应该做到以下几点

print(u'\u2014'.replace("\u2014",""))

and that will work这会起作用

EDIT: since you are using python 2.x, you have to encode it with utf-8 as follows编辑:由于您使用的是 python 2.x,您必须使用 utf-8 对其进行编码,如下所示

u'\u2014'.encode('utf-8').decode('utf-8').replace("-","")

Yeah, Because it is taking '2014' followed by '\\u\u0026#39; as a unicode string and not a string literal.是的,因为它将 '2014' 后跟 '\\u\u0026#39; 作为 unicode 字符串而不是字符串文字。

Things that can help:可以提供帮助的事情:

  • Converting to ascii using .encode('ascii', 'ignore')使用 .encode('ascii', 'ignore') 转换为 ascii
  • As you are using pandas, you can use 'encoding' parameter and pass 'ascii' there.当您使用 Pandas 时,您可以使用 'encoding' 参数并在那里传递 'ascii'。
  • Do this instead : u'\—'.replace(u'\—', u'2014').encode('ascii', 'ignore')改为这样做: u'\—'.replace(u'\—', u'2014').encode('ascii', 'ignore')

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM