简体   繁体   English

Python mmh3:UnicodeEncodeError:'ascii'编解码器无法在位置0-14处编码字符:序数不在范围内(128)

[英]Python mmh3: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-14: ordinal not in range(128)

I'm querying a DB for jokes and am getting back Python str s. 我正在向数据库查询笑话,并重新获得Python str I want to use them as Unicode objects, so I do: 我想将它们用作Unicode对象,所以我这样做:

joke = unicode(joke, 'utf-8')

This works for all my DB results and does not cause any issues. 这适用于我所有的数据库结果,不会引起任何问题。

Then I try to hash each word in each joke like this: 然后,我尝试像这样对每个笑话中的每个单词进行哈希处理:

result = mmh3.hash(joke)

and I get back: 然后我回来:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-14: ordinal not in range(128)

I inspected the text and it's Japanese. 我检查了文本,它是日语。 Does this mean I should drop all non-ascii characters before hashing or is there a better way to handle this? 这是否意味着我应该在散列之前删除所有非ASCII字符,还是有更好的方法来处理此问题?

Thanks! 谢谢!

The .hash(...) function appears to require either bytes or ascii -convertible text. .hash(...)函数似乎需要bytesascii转换的文本。

The easiest way (if you're dealing entirely with unicode objects) is to convert them to bytes as you call mmh3.hash : 最简单的方法(如果要完全处理unicode对象)是在调用mmh3.hash将它们转换为bytes

result = mmh3.hash(joke.encode('UTF-8'))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:UnicodeEncodeError:'ascii'编解码器无法在位置34-39处编码字符:序数不在范围内(128) - Python: UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-39: ordinal not in range(128) Python2.7 UnicodeEncodeError:'ascii'编解码器不能编码0-11位的字符:序号不在范围内(128) - Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128) UnicodeEncodeError:“ ascii”编解码器无法对不在范围内的字符进行编码(128) - UnicodeEncodeError: 'ascii' codec can't encode characters ordinal not in range(128) 当我使用Chinese.UnicodeEncodeError:'ascii'编解码器无法在位置14-15编码字符:序数不在范围内(128) - When I use Chinese.UnicodeEncodeError: 'ascii' codec can't encode characters in position 14-15: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器无法在位置0-3处编码字符:序数不在范围(128)中? - UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)? UnicodeEncodeError:“ ascii”编解码器无法对位置10-11中的字符进行编码:序数不在范围内(128) - UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128) Canopy UnicodeEncodeError:“ ascii”编解码器无法对位置31-32中的字符进行编码:序数不在范围内(128) - Canopy UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器不能编码位置0-6的字符:序数不在范围内(128) - UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-6: ordinal not in range(128) UnicodeEncodeError: 'ascii' 编解码器无法对位置 0-9 中的字符进行编码:序号不在范围内 (128) - UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器无法对位置4273-4279中的字符进行编码:序数不在范围内(128) - UnicodeEncodeError: 'ascii' codec can't encode characters in position 4273-4279: ordinal not in range(128)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM