简体   繁体   English

如何使用Python在Bytes中获取UTF-8字符串的大小

[英]How do I get a size of an UTF-8 string in Bytes with Python

Having an UTF-8 string like this: 有这样的UTF-8字符串:

mystring = "işğüı"

is it possible to get its (in memory) size in Bytes with Python (2.5)? 是否可以使用Python(2.5)以字节为单位获取其(内存中)大小?

Assuming you mean the number of UTF-8 bytes (and not the extra bytes that Python requires to store the object), it's the same as for the length of any other string. 假设你的意思是UTF-8字节的数量(而不是Python存储对象所需的额外字节数),它与任何其他字符串的长度相同。 A string literal in Python 2.x is a string of encoded bytes, not Unicode characters. Python 2.x中的字符串文字是一串编码字节,而不是Unicode字符。

Byte strings: 字节字符串:

>>> mystring = "işğüı"
>>> print "length of {0} is {1}".format(repr(mystring), len(mystring))
length of 'i\xc5\x9f\xc4\x9f\xc3\xbc\xc4\xb1' is 9

Unicode strings: Unicode字符串:

>>> myunicode = u"işğüı"
>>> print "length of {0} is {1}".format(repr(myunicode), len(myunicode))
length of u'i\u015f\u011f\xfc\u0131' is 5

It's good practice to maintain all of your strings in Unicode, and only encode when communicating with the outside world. 最好将所有字符串保存在Unicode中,并且只在与外界通信时进行编码。 In this case, you could use len(myunicode.encode('utf-8')) to find the size it would be after encoding. 在这种情况下,您可以使用len(myunicode.encode('utf-8'))来查找编码后的大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM