Python-将非英语UTF-8编码的字符串转换为字符列表

Question

I have a UTF-8 encoded string containing both English and non-english characters. 我有一个包含英语和非英语字符的UTF-8编码字符串。 I am trying to convert this string to a list of single characters. 我正在尝试将此字符串转换为单个字符的列表。 When I just use list(), some of the non-English letters are cut in the middle. 当我只使用list（）时，中间会剪掉一些非英语字母。 For example: 例如：

In [200]: s = "abאב"

In [201]: print s
abאב

In [202]: l = list(s)

In [203]: print l
['a', 'b', '\xd7', '\x90', '\xd7', '\x91']

In [204]: print l[2]
�

In [205]: print l[2]+l[3]
א

l[2] prints gibberish since the encoding of the letter א is \\xd7\\x90 and not \\xd7. l [2]显示乱码，因为字母א的编码是\\ xd7 \\ x90而不是\\ xd7。 How can I adequately split the string? 如何充分分割字符串？

Thanks. 谢谢。

Answer 1

I assume you run Python 2.7 我假设您运行的是Python 2.7

If you will work a lot with UTF-8 you should consider running Python 3. In Python 3 it works as you would expect. 如果要使用UTF-8进行很多工作，则应考虑运行Python3。在Python 3中，它可以按预期运行。

print(l)
['a', 'b', 'א', 'ב']
print(l[2])
א

Answer 2

I assume you are using python2: 我假设您正在使用python2：

>>> list(s.decode('utf8'))       
[u'a', u'b', u'\u05d0', u'\u05d1']

Python-将非英语UTF-8编码的字符串转换为字符列表

问题描述

2 个解决方案

解决方案1
1 2017-08-23 09:26:36

解决方案2
1 已采纳 2017-08-23 09:26:40

Python-将非英语UTF-8编码的字符串转换为字符列表

问题描述

2 个解决方案

解决方案1 1 2017-08-23 09:26:36

解决方案2 1 已采纳 2017-08-23 09:26:40

解决方案1
1 2017-08-23 09:26:36

解决方案2
1 已采纳 2017-08-23 09:26:40