[英]ascii decode error even though everything being unicode ( python 2.7)
i am running a script in dataflow (apache beam) it runs in python 2.7.12 and does some text processing with unicode strings. 我在数据流(Apache Beam)中运行脚本,它在python 2.7.12中运行,并使用unicode字符串进行一些文本处理。
Amongst the processing i do the following, where noun and phrase are unicode ( i think... ) 在处理过程中,我执行以下操作,其中名词和短语是unicode(我认为...)
# -*- coding: utf-8 -*-
...
key = u"{}_{}".format(
noun, phrase.replace(u" ", u"_")
)
However it yields ascii decode errors 但是它会产生ascii解码错误
'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
I can put in debugging and get a repr of the strings used in as noun and phrase but i currently don't have them since my logging didn't output them. 我可以进行调试并获得用作名词和短语的字符串的代表,但由于日志记录未输出它们,因此我目前没有它们。
i don't understand the ascii decode error when i think i am pretty specific that i want everything in unicode! 当我认为我要用unicode编写所有内容时,我不明白ascii解码错误!
can you give some hints or should i come back with more info about the input strings? 您能否给出一些提示,还是我应该返回有关输入字符串的更多信息?
OK, so you have a non ascii character in your string. 好的,因此您的字符串中包含一个非ASCII字符。 You need to convert phrase
into unicode directly 您需要直接将phrase
转换为unicode
phrase.decode('latin-1')
before manipulating in unicode.format
在以unicode.format
进行操作之前
a colleague reminded me that i could always just decode the whole output, in this case being the key, to whatever format i chose. 一位同事提醒我,我总是可以将整个输出解码,在这种情况下,将其解码为我选择的任何格式。
key = u"{}_{}_{}_{}".format(
business_unit_id, date, noun, phrase.replace(u" ", u"_")
).encode('ascii', 'ignore')
in the case i wanted ascii output and not care about missing chars like 💩. 在我想要ascii输出而不关心像and这样的字符的情况下。
i could also use ...).encode('utf-8')
if i wanted that output in unicode. 我也可以使用...).encode('utf-8')
如果我想要用Unicode输出。
in my case i settled with ascii output as the pipeline in apache beam did not seem happy with unicode keys in its map reduce pipelines 在我的情况下,我用ascii输出解决了,因为Apache Beam中的管道似乎对它的map reduce管道中的unicode键不满意。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.