如何使用Python在Bytes中获取UTF-8字符串的大小

Question

Having an UTF-8 string like this: 有这样的UTF-8字符串：

mystring = "işğüı"

is it possible to get its (in memory) size in Bytes with Python (2.5)? 是否可以使用Python（2.5）以字节为单位获取其（内存中）大小？

Answer 1

Assuming you mean the number of UTF-8 bytes (and not the extra bytes that Python requires to store the object), it's the same as for the length of any other string. 假设你的意思是UTF-8字节的数量（而不是Python存储对象所需的额外字节数），它与任何其他字符串的长度相同。 A string literal in Python 2.x is a string of encoded bytes, not Unicode characters. Python 2.x中的字符串文字是一串编码字节，而不是Unicode字符。

Byte strings: 字节字符串：

>>> mystring = "işğüı"
>>> print "length of {0} is {1}".format(repr(mystring), len(mystring))
length of 'i\xc5\x9f\xc4\x9f\xc3\xbc\xc4\xb1' is 9

Unicode strings: Unicode字符串：

>>> myunicode = u"işğüı"
>>> print "length of {0} is {1}".format(repr(myunicode), len(myunicode))
length of u'i\u015f\u011f\xfc\u0131' is 5

It's good practice to maintain all of your strings in Unicode, and only encode when communicating with the outside world. 最好将所有字符串保存在Unicode中，并且只在与外界通信时进行编码。 In this case, you could use len(myunicode.encode('utf-8')) to find the size it would be after encoding. 在这种情况下，您可以使用len(myunicode.encode('utf-8'))来查找编码后的大小。

如何使用Python在Bytes中获取UTF-8字符串的大小

问题描述

1 个解决方案

解决方案1
7 已采纳 2010-10-01 19:53:32

如何使用Python在Bytes中获取UTF-8字符串的大小

问题描述

1 个解决方案

解决方案1 7 已采纳 2010-10-01 19:53:32

解决方案1
7 已采纳 2010-10-01 19:53:32