[英]Getting the same Unicode string length in both Python 2 and 3?
Uhh, Python 2 / 3 is so frustrating... Consider this example, test.py
: 呃,Python 2/3太令人沮丧了......考虑一下这个例子,
test.py
:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
if sys.version_info[0] < 3:
text_type = unicode
binary_type = str
def b(x):
return x
def u(x):
return unicode(x, "utf-8")
else:
text_type = str
binary_type = bytes
import codecs
def b(x):
return codecs.latin_1_encode(x)[0]
def u(x):
return x
tstr = " ▲ "
sys.stderr.write(tstr)
sys.stderr.write("\n")
sys.stderr.write(str(len(tstr)))
sys.stderr.write("\n")
Running it: 运行它:
$ python2.7 test.py
▲
5
$ python3.2 test.py
▲
3
Great, I get two differing string sizes. 太棒了,我得到两个不同的字符串大小。 Hopefully wrapping the string in one of these wrappers I found around the net will help?
希望将字符串包装在我在网周围发现的其中一个包装中会有帮助吗?
For tstr = text_type(" ▲ ")
: 对于
tstr = text_type(" ▲ ")
:
$ python2.7 test.py
Traceback (most recent call last):
File "test.py", line 21, in <module>
tstr = text_type(" ▲ ")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py
▲
3
For tstr = u(" ▲ ")
: 对于
tstr = u(" ▲ ")
:
$ python2.7 test.py
Traceback (most recent call last):
File "test.py", line 21, in <module>
tstr = u(" ▲ ")
File "test.py", line 11, in u
return unicode(x)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py
▲
3
For tstr = b(" ▲ ")
: 对于
tstr = b(" ▲ ")
:
$ python2.7 test.py
▲
5
$ python3.2 test.py
Traceback (most recent call last):
File "test.py", line 21, in <module>
tstr = b(" ▲ ")
File "test.py", line 17, in b
return codecs.latin_1_encode(x)[0]
UnicodeEncodeError: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)
For tstr = binary_type(" ▲ ")
: 对于
tstr = binary_type(" ▲ ")
:
$ python2.7 test.py
▲
5
$ python3.2 test.py
Traceback (most recent call last):
File "test.py", line 21, in <module>
tstr = binary_type(" ▲ ")
TypeError: string argument without an encoding
Well, that certainly makes things easy. 嗯,这肯定会让事情变得简单。
So, how to get the same string length (in this case, 3) in both Python 2.7 and 3.2? 那么,如何在Python 2.7和3.2中获得相同的字符串长度(在本例中为3)?
Well, turns out unicode() in Python 2.7 has an encoding
argument, and that apparently helps: 好吧,事实证明,Python 2.7中的unicode()有一个
encoding
参数,这显然有助于:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
if sys.version_info[0] < 3:
text_type = unicode
binary_type = str
def b(x):
return x
def u(x):
return unicode(x, "utf-8")
else:
text_type = str
binary_type = bytes
import codecs
def b(x):
return codecs.latin_1_encode(x)[0]
def u(x):
return x
tstr = u(" ▲ ")
sys.stderr.write(tstr)
sys.stderr.write("\n")
sys.stderr.write(str(len(tstr)))
sys.stderr.write("\n")
Running this, I get what I needed: 运行这个,我得到我需要的东西:
$ python2.7 test.py
▲
3
$ python3.2 test.py
▲
3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.