Python mmh3：UnicodeEncodeError：'ascii'编解码器无法在位置0-14处编码字符：序数不在范围内（128）

Question

I'm querying a DB for jokes and am getting back Python str s. 我正在向数据库查询笑话，并重新获得Python str 。 I want to use them as Unicode objects, so I do: 我想将它们用作Unicode对象，所以我这样做：

joke = unicode(joke, 'utf-8')

This works for all my DB results and does not cause any issues. 这适用于我所有的数据库结果，不会引起任何问题。

Then I try to hash each word in each joke like this: 然后，我尝试像这样对每个笑话中的每个单词进行哈希处理：

result = mmh3.hash(joke)

and I get back: 然后我回来：

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-14: ordinal not in range(128)

I inspected the text and it's Japanese. 我检查了文本，它是日语。 Does this mean I should drop all non-ascii characters before hashing or is there a better way to handle this? 这是否意味着我应该在散列之前删除所有非ASCII字符，还是有更好的方法来处理此问题？

Thanks! 谢谢！

Answer 1

The .hash(...) function appears to require either bytes or ascii -convertible text. .hash(...)函数似乎需要bytes或ascii转换的文本。

The easiest way (if you're dealing entirely with unicode objects) is to convert them to bytes as you call mmh3.hash : 最简单的方法（如果要完全处理unicode对象）是在调用mmh3.hash将它们转换为bytes ：

result = mmh3.hash(joke.encode('UTF-8'))

Python mmh3：UnicodeEncodeError：'ascii'编解码器无法在位置0-14处编码字符：序数不在范围内（128）

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-08-25 23:31:02

Python mmh3：UnicodeEncodeError：&#39;ascii&#39;编解码器无法在位置0-14处编码字符：序数不在范围内（128）

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-08-25 23:31:02

Python mmh3：UnicodeEncodeError：'ascii'编解码器无法在位置0-14处编码字符：序数不在范围内（128）

解决方案1
4 已采纳 2018-08-25 23:31:02