[英]Cannot split a unicode string without converting to ascii - python 2.7
I want to split the string I have £300
but it seems that the split function first converts it to a ascii and after. 我想分割I have £300
的字符串,但似乎split函数先将其转换为ascii,然后转换为。 But I can't convert it back to unicode the same as it was before. 但是我无法像以前一样将其转换回unicode。
Is there any other way to split such a unicode string without breaking it as in the snippet bellow. 还有没有其他方法可以拆分此类unicode字符串,而无需像片段摘录中所示那样对其进行破坏。
# -*- coding: utf-8 -*-
mystring = 'I have £300.'
alist = mystring.split()
alist = [item.decode("utf-8") for item in alist]
print "alist",alist
print "mystring.split()",mystring.split()
#I want to get [I,have,£300]
#I get: ['I', 'have', '\xc2\xa3300.']
You are looking at a limitation of the way python 2 displays data. 您正在查看python 2 显示数据方式的局限性。
Using python 2: 使用python 2:
>>> mystring = 'I have £300.'
>>> mystring.split()
['I', 'have', '\xc2\xa3300.']
But, observe that it will print as you want: 但是,请注意它将根据需要进行打印:
>>> print(mystring.split()[2])
£300.
Using python 3, by contrast, it displays as you would like: 相比之下,使用python 3,它显示为您想要的:
>>> mystring = 'I have £300.'
>>> mystring.split()
['I', 'have', '£300.']
A major reason to use python 3 is its superior handling of unicode. 使用python 3的主要原因是其对unicode的出色处理。
The problem is not with split()
. 问题不在于split()
。 The real problem is that the handling of unicode in python 2 is confusing. 真正的问题是python 2中的unicode处理令人困惑。
The first line in your code produces a string, ie a sequence of bytes, which contains the utf-8 encoding of the symbol £
. 您代码的第一行产生一个字符串,即字节序列,其中包含符号£
的utf-8编码。 You can confirm this by displaying the repr
of your original string: 您可以通过显示原始字符串的repr
来确认这一点:
>>> mystring
'I have \xc2\xa3300.'
The rest of the statements just do what you would expect them to with such input. 其余的语句正是按照您期望的那样进行输入的。 If you want to work with unicode, create a unicode string to start with: 如果要使用unicode,请创建一个以以下内容开头的unicode字符串:
>>> mystring = u'I have £300.'
A far better solution, however, is to switch to Python 3. Wrapping your head around the semantics of unicode in python 2 is not worth the effort when there's such a superior alternative. 但是,一个更好的解决方案是切换到Python3。如果有这样一个更好的选择,那么用Python 2中的unicode语义来解决问题就不值得了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.