简体繁体 English

RAM中的Python字符串

[英]Python strings in RAM

原文 2016-01-19 13:36:00 3 1 python/ unicode

How are string variables saved in the RAM? 字符串变量如何保存在RAM中？

for example: foo = 'abcÇ¶' and s = u'abc\ℙ' . 例如： foo = 'abcÇ¶'和s = u'abc\ℙ' 。

The strings contains UNICODE characters, so should it be and how is it encoded before stored in memory? 字符串包含UNICODE字符，应该这样吗？在存储到内存之前如何编码？

1 个解决方案

I'm assuming you're asking about CPython, the standard Python implementation. 我假设您要问的是标准Python实现CPython。

The Unicode string representation was changed beginning with Python 3.3 as described in PEP 0393 . 如PEP 0393中所述，从Python 3.3开始更改了Unicode字符串表示形式。 Since then, strings use the same number of bytes for all characters, either 1, 2 or 4, choosing the smallest possible for each string depending on its contents. 从那时起，字符串对所有字符（1、2或4）使用相同数量的字节，请根据其内容为每个字符串选择尽可能小的字节。 The specific encodings used are: 使用的特定编码为：

1 byte per char: Latin-1 每个字符1个字节：拉丁文1
2 bytes per char: UCS-2 每个字符2个字节：UCS-2
4 bytes per char: UCS-4 每个字符4个字节：UCS-4

Before version 3.3, the Unicode string representation depended on the system, and was usually either UTF-16, UCS-4 or UCS-2, to the best of my understanding. 根据我的理解，在版本3.3之前，Unicode字符串表示取决于系统，通常为UTF-16，UCS-4或UCS-2。 See the above-mentioned PEP 0393 and its references for more details. 有关更多详细信息，请参见上述PEP 0393及其参考。