简体   繁体   English

在Python 2和3中获得相同的Unicode字符串长度?

[英]Getting the same Unicode string length in both Python 2 and 3?

Uhh, Python 2 / 3 is so frustrating... Consider this example, test.py : 呃,Python 2/3太令人沮丧了......考虑一下这个例子, test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
if sys.version_info[0] < 3:
  text_type = unicode
  binary_type = str
  def b(x):
    return x
  def u(x):
    return unicode(x, "utf-8")
else:
  text_type = str
  binary_type = bytes
  import codecs
  def b(x):
    return codecs.latin_1_encode(x)[0]
  def u(x):
    return x

tstr = " ▲ "

sys.stderr.write(tstr)
sys.stderr.write("\n")
sys.stderr.write(str(len(tstr)))
sys.stderr.write("\n")

Running it: 运行它:

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
 ▲ 
3

Great, I get two differing string sizes. 太棒了,我得到两个不同的字符串大小。 Hopefully wrapping the string in one of these wrappers I found around the net will help? 希望将字符串包装在我在网周围发现的其中一个包装中会有帮助吗?

For tstr = text_type(" ▲ ") : 对于tstr = text_type(" ▲ ")

$ python2.7 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = text_type(" ▲ ")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py 
 ▲ 
3

For tstr = u(" ▲ ") : 对于tstr = u(" ▲ ")

$ python2.7 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = u(" ▲ ")
  File "test.py", line 11, in u
    return unicode(x)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py 
 ▲ 
3

For tstr = b(" ▲ ") : 对于tstr = b(" ▲ ")

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = b(" ▲ ")
  File "test.py", line 17, in b
    return codecs.latin_1_encode(x)[0]
UnicodeEncodeError: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)

For tstr = binary_type(" ▲ ") : 对于tstr = binary_type(" ▲ ")

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = binary_type(" ▲ ")
TypeError: string argument without an encoding

Well, that certainly makes things easy. 嗯,这肯定会让事情变得简单。

So, how to get the same string length (in this case, 3) in both Python 2.7 and 3.2? 那么,如何在Python 2.7和3.2中获得相同的字符串长度(在本例中为3)?

Well, turns out unicode() in Python 2.7 has an encoding argument, and that apparently helps: 好吧,事实证明,Python 2.7中的unicode()有一个encoding参数,这显然有助于:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
if sys.version_info[0] < 3:
  text_type = unicode
  binary_type = str
  def b(x):
    return x
  def u(x):
    return unicode(x, "utf-8")
else:
  text_type = str
  binary_type = bytes
  import codecs
  def b(x):
    return codecs.latin_1_encode(x)[0]
  def u(x):
    return x

tstr = u(" ▲ ")

sys.stderr.write(tstr)
sys.stderr.write("\n")
sys.stderr.write(str(len(tstr)))
sys.stderr.write("\n")

Running this, I get what I needed: 运行这个,我得到我需要的东西:

$ python2.7 test.py 
 ▲ 
3
$ python3.2 test.py 
 ▲ 
3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM