简体   繁体   English

为什么Python中的空字符串有时会占用49个字节,有时会占用51个字节?

[英]Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes: 我在三个环境中测试了sys.getsize('')sys.getsize(' ') ,其中两个sys.getsize('')给了我51个字节(比第二个多一个字节)而不是49个字节:

Screenshots: 截图:

Win8 + Spyder + CPython 3.6: Win8 + Spyder + CPython 3.6:

sys.getsizeof('')== 49和sys.getsizeof('')== 50

Win8 + Spyder + IPython 3.6: Win8 + Spyder + IPython 3.6:

sys.getsizeof('')== 51和sys.getsizeof('')== 50

Win10 (VPN remote) + PyCharm + CPython 3.7: Win10(VPN远程)+ PyCharm + CPython 3.7:

sys.getsizeof('')== 51和sys.getsizeof('')== 50

First edit 首先编辑

I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. 我在Python.exe中进行了第二次测试而不是Spyder和PyCharm(这两个仍然显示51),一切似乎都很好。 Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :) 显然我没有专业知识来解决这个问题所以我会留给你们:)

Win10 + Python 3.7 console versus PyCharm using same interpreter: Win10 + Python 3.7控制台与PyCharm使用相同的解释器:

在此输入图像描述

Win8 + IPython 3.6 + Spyder using same interpreter: Win8 + IPython 3.6 + Spyder使用相同的解释器:

在此输入图像描述

This sounds like something is retrieving the wchar representation of the string object. 这听起来像是检索字符串对象的wchar表示。 As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). 从CPython 3.7开始,CPython Unicode表示的工作方式,空字符串通常以“紧凑的ASCII”表示形式存储,64位构建的紧凑ASCII字符串的基本数据和填充可达48字节,加上一个字节的字符串数据(只是空终止符)。 You can see the relevant header file here . 您可以在此处查看相关的头文件。

For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. 就目前而言(这计划在4.0中删除 ),还有一个选项来检索字符串的wchar_t表示。 On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). 在具有2字节wchar_t的平台上,空字符串的wchar表示是2个字节(再次只是空终止符)。 The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total. wchar表示在第一次访问时缓存在字符串上,而str.__sizeof__在存在时会str.__sizeof__这些额外数据,从而产生51字节的总数。

https://docs.python.org/3.5/library/sys.html#sys.getsizeof https://docs.python.org/3.5/library/sys.html#sys.getsizeof

sys is system specific so it can easily differ. sys是系统特定的,因此很容易区分。 This is often overlooked by everyone. 每个人都经常忽视这一点。 All system specific stuff in python has been dumped in the sys package for years. python中所有特定于系统的东西都已经在sys包中转储多年了。 For eg sys.getwindowsversion() is not portable by definition but it's there. 例如, sys.getwindowsversion()根据定义是不可移植的,但它就在那里。 It like the bottomless pit of rejects in the perfect world of cross platform coding. 它就像在完美的跨平台编码世界中无底洞的拒绝。 What you see is one of the interesting nuggets of Python. 你看到的是Python的一个有趣的小块。

from getsizeof docs: 来自getsizeof docs:

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to. 仅考虑直接归因于对象的内存消耗,而不考虑它所引用的对象的内存消耗。 getsizeof() calls the object's __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector. getsizeof()调用对象的__sizeof__方法,如果对象由垃圾收集器管理,则会增加额外的垃圾收集器开销。

When Garbage collection is in use the OS will add those extra bits. 当使用垃圾收集时,操作系统将添加这些额外的位。 If you read Python and GC Q & A When are objects garbage collected in python? 如果您阅读Python和GC Q&A 何时在python中收集垃圾? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah. 人们已经进入了令人难以忍受的细节阐述GC以及它将如何影响内存/引用计数和比特等等。

I hope that explains where this coming from. 我希望这可以解释这来自哪里。 If you don't use system level attributes but more pythonic attributes then you will get consistent sizes. 如果您不使用system级属性但使用更多pythonic属性,那么您将获得一致的大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM