在不转换为ASCII的情况下无法拆分Unicode字符串-python 2.7

Question

I want to split the string I have £300 but it seems that the split function first converts it to a ascii and after. 我想分割I have £300的字符串，但似乎split函数先将其转换为ascii，然后转换为。 But I can't convert it back to unicode the same as it was before. 但是我无法像以前一样将其转换回unicode。

Is there any other way to split such a unicode string without breaking it as in the snippet bellow. 还有没有其他方法可以拆分此类unicode字符串，而无需像片段摘录中所示那样对其进行破坏。

# -*- coding: utf-8 -*-
mystring = 'I have £300.'
alist = mystring.split()
alist = [item.decode("utf-8") for item in alist]
print "alist",alist
print "mystring.split()",mystring.split()

#I want to get [I,have,£300]
#I get: ['I', 'have', '\xc2\xa3300.']

Answer 1

You are looking at a limitation of the way python 2 displays data. 您正在查看python 2 显示数据方式的局限性。

Using python 2: 使用python 2：

>>> mystring = 'I have £300.'
>>> mystring.split()
['I', 'have', '\xc2\xa3300.']

But, observe that it will print as you want: 但是，请注意它将根据需要进行打印：

>>> print(mystring.split()[2])
£300.

Using python 3, by contrast, it displays as you would like: 相比之下，使用python 3，它显示为您想要的：

>>> mystring = 'I have £300.'
>>> mystring.split()
['I', 'have', '£300.']

A major reason to use python 3 is its superior handling of unicode. 使用python 3的主要原因是其对unicode的出色处理。

Answer 2

The problem is not with split() . 问题不在于split() 。 The real problem is that the handling of unicode in python 2 is confusing. 真正的问题是python 2中的unicode处理令人困惑。

The first line in your code produces a string, ie a sequence of bytes, which contains the utf-8 encoding of the symbol £ . 您代码的第一行产生一个字符串，即字节序列，其中包含符号£的utf-8编码。 You can confirm this by displaying the repr of your original string: 您可以通过显示原始字符串的repr来确认这一点：

>>> mystring
'I have \xc2\xa3300.'

The rest of the statements just do what you would expect them to with such input. 其余的语句正是按照您期望的那样进行输入的。 If you want to work with unicode, create a unicode string to start with: 如果要使用unicode，请创建一个以以下内容开头的unicode字符串：

>>> mystring = u'I have £300.'

A far better solution, however, is to switch to Python 3. Wrapping your head around the semantics of unicode in python 2 is not worth the effort when there's such a superior alternative. 但是，一个更好的解决方案是切换到Python3。如果有这样一个更好的选择，那么用Python 2中的unicode语义来解决问题就不值得了。

在不转换为ASCII的情况下无法拆分Unicode字符串-python 2.7

问题描述

2 个解决方案

解决方案1
3 2016-08-29 23:04:54

解决方案2
1 2016-08-29 23:24:57

在不转换为ASCII的情况下无法拆分Unicode字符串-python 2.7

问题描述

2 个解决方案

解决方案1 3 2016-08-29 23:04:54

解决方案2 1 2016-08-29 23:24:57

解决方案1
3 2016-08-29 23:04:54

解决方案2
1 2016-08-29 23:24:57