就地for循環中的Python字符串連接？

Question

我知道 Python 字符串是不可變的，這意味着

letters = "world"
letters += "sth"

連接后會給我一個不同的字符串對象

begin: id(letters): 1828275686960
end: id(letters): 1828278265776

但是，當我運行 for 循環附加到字符串時，結果是字符串對象在 for 循環期間保持不變：

letters = "helloworld"
print("before for-loop:")
print(id(letters))
print("in for-loop")

for i in range(5):
    letters += str(i)
    print(id(letters))

輸出：

before for-loop:
2101555236144
in for-loop
2101557044464
2101557044464
2101557044464
2101557044464
2101557044464

顯然， letter指向的底層字符串對象在 for 循環期間沒有改變，這與字符串應該是不可變的概念相矛盾。

這是 Python 在幕后執行的某種優化嗎？

Answer 1

從文檔：

ID（對象）

返回對象的“身份”。 這是一個整數，保證在此對象的生命周期內是唯一且恆定的。 生命周期不重疊的兩個對象可能具有相同的 id() 值。

CPython 實現細節：這是對象在內存中的地址。

在這種情況下，方法id()是存儲字符串的內存地址，如源代碼所示：

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
    PyObject *id = PyLong_FromVoidPtr(v);

    if (id && PySys_Audit("builtins.id", "O", id) < 0) {
        Py_DECREF(id);
        return NULL;
    }

    return id;
}

發生的事情是兩個對象的生命結束和開始確實重疊。 Python 保證字符串的不變性，只要它們還活着。 正如@kris 建議的文章所示：

import _ctypes
    
a = "abcd"
a += "e"

before_f_id = id(a)

a += "f"

print(a)
print( _ctypes.PyObj_FromPtr(before_f_id) ) # prints: "abcdef"

字符串a結束是生命，並且不能保證可以檢索給定內存位置，實際上上面的示例表明它被重用於新變量。

我們可以來看看它是如何實現的引擎蓋下的unicode_concatenate看着代碼的最后幾行方法：

res = v;
PyUnicode_Append(&res, w);
return res;

其中v和w是表達式中的那些： v += w

PyUnicode_Append方法PyUnicode_Append是在嘗試為新對象重用相同的內存位置，在PyUnicode_Append 中有詳細說明：

PyUnicode_Append(PyObject **p_left, PyObject *right):

...

new_len = left_len + right_len;

if (unicode_modifiable(left)
    ...
{
    /* append inplace */
    if (unicode_resize(p_left, new_len) != 0)
        goto error;

    /* copy 'right' into the newly allocated area of 'left' */
    _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
}

就地for循環中的Python字符串連接？

問題描述

1 個解決方案

解決方案1
3 2021-10-13 14:36:12

就地for循環中的Python字符串連接？

問題描述

1 個解決方案

解決方案1 3 2021-10-13 14:36:12

解決方案1
3 2021-10-13 14:36:12