如何使用Python在Bytes中獲取UTF-8字符串的大小

Question

有這樣的UTF-8字符串：

mystring = "işğüı"

是否可以使用Python（2.5）以字節為單位獲取其（內存中）大小？

Answer 1

假設你的意思是UTF-8字節的數量（而不是Python存儲對象所需的額外字節數），它與任何其他字符串的長度相同。 Python 2.x中的字符串文字是一串編碼字節，而不是Unicode字符。

字節字符串：

>>> mystring = "işğüı"
>>> print "length of {0} is {1}".format(repr(mystring), len(mystring))
length of 'i\xc5\x9f\xc4\x9f\xc3\xbc\xc4\xb1' is 9

Unicode字符串：

>>> myunicode = u"işğüı"
>>> print "length of {0} is {1}".format(repr(myunicode), len(myunicode))
length of u'i\u015f\u011f\xfc\u0131' is 5

最好將所有字符串保存在Unicode中，並且只在與外界通信時進行編碼。 在這種情況下，您可以使用len(myunicode.encode('utf-8'))來查找編碼后的大小。