Why does python memory allocation behave like this?

Question

Consider the following code which is used for finding all valid parentheses placements with n parentheses.

def paren(n):
    ans = []
    def helper(string, left, right,n,foo):
        if len(string)==2*n:
            ans.append(string)
        if left > 0:
            helper(string+'(', left-1,right,n)
        if right > left:
            helper(string+')', left,right-1,n)
    helper('',n,n,n)

If we add a list (that has no practical use in the function) we get

def paren(n):
    ans = []
    def helper(string, left, right,n,foo):
        print(hex(id(foo)), foo)
        if len(string)==2*n:
            ans.append(string)
        if left > 0:
            helper(string+'(', left-1,right,n,[])
        if right > left:
            helper(string+')', left,right-1,n,[])
    helper('',n,n,n,[])
    
paren(2)

OUTPUT:

0x2e5e2446288 []
0x2e5e28e3508 []
0x2e5e28e3688 []
0x2e5e26036c8 []
0x2e5e27bafc8 []
0x2e5e28e3688 []
0x2e5e26036c8 []
0x2e5e27bafc8 []

Whereas if we explicitly pass foo each time then we get

def paren(n):
    ans = []
    def helper(string, left, right,n,foo):
        print(hex(id(foo)), foo)
        if len(string)==2*n:
            ans.append(string)
        if left > 0:
            helper(string+'(', left-1,right,n,foo)
        if right > left:
            helper(string+')', left,right-1,n,foo)
    helper('',n,n,n,[])
    
paren(2)

OUTPUT:

0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []

In the first case we get a different object in memory, why is this compared to the second case when we don't I think it is to do with with the fact that we create a new list rather than passing the function argument?

However when we add something to foo we get the same behaviour as with the first case:

def paren(n):
    ans = []
    def helper(string, left, right,n,foo):
        print(hex(id(foo)), foo)
        if len(string)==2*n:
            ans.append(string)
        if left > 0:
            helper(string+'(', left-1,right,n,foo+['bar'])
        if right > left:
            helper(string+')', left,right-1,n,foo+['bar'])
    helper('',n,n,n,[])
    
paren(2)

OUTPUT:

0x269572e6288 []
0x26959283548 ['bar']
0x26957363688 ['bar', 'bar']
0x2695925ae88 ['bar', 'bar', 'bar']
0x26957363408 ['bar', 'bar', 'bar', 'bar']
0x2695925ae88 ['bar', 'bar']
0x26957363408 ['bar', 'bar', 'bar']
0x269592833c8 ['bar', 'bar', 'bar', 'bar']

But strangely if we pass some int for foo, I will take 5 for demonstration we get:

def paren(n):
    ans = []
    def helper(string, left, right,n,foo):
        print(hex(id(foo)), foo)
        if len(string)==2*n:
            ans.append(string)
        if left > 0:
            helper(string+'(', left-1,right,n,5)
        if right > left:
            helper(string+')', left,right-1,n,5)
    helper('',n,n,n,5)
    
paren(2)

OUTPUT:

0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5

ie the same point in memory.

However if I replace the 5 in the above code with a larger int for instance 2550 I get the following:

0x2519f6d4790 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550

So initially it is stored at a different memory address but each subsequent call is at the same address. Why is this changing from the case foo=5 what is going on here?

Also in examples where the memory address is changing between calls I do see the same memory addresses being used on more than one occasion for example:

...
0x2695925ae88 ['bar', 'bar', 'bar']
...
0x2695925ae88 ['bar', 'bar']
...

Why is this the case? Is python using previously used memory addresses to store new variables once the old ones are no longer on the recursion call stack?

My mind is really fuzzy on these behaviours so if anyone could help me out that would be great!

I have heard of things like pass by reference and pass by value, but I am not too sure what they mean and if it relates to this python example.

Thank you.

Answer 1

First code: the call stack keeps a reference to each foo , so you have many lists in memory at once, each with a unique ID.

Second code: you pass the same list (that was initially empty) to each recursive call.

Third code: Cpython, as an implementation-specific optimization, caches small int constants for reuse

Fourth code Cpython does not cache large (ie, greater than 256), so each occurrence of 2550 creates a new int object.

Answer 2

I will try to explain this behavior to you with a generic explanation.

Python differentiates between reference data types and primitive data types. Reference data types save references in the memory to a stored value and not the value them self. On the other hand primitive (or value) data types save the value itself.

If you pass an array to a method it will create a new instance of this array every time. So doing some_function([]) generates a new array and passes it to the function. This means calling some_function([]) and then some_function([]) again will create one array on each call. However, if you create an array like foo = [] then foo will reference to the array and calling some_function(foo) and then some_function(foo) will execute some_function on this array.

This is not happening for values like float, int, boolean, but for values like objects and arrays.

Follow up research can be done by following keywords: reference data type, value data type, built in types.

I hope my answer was helpful in understanding the stated problem.

Why does python memory allocation behave like this?

Question

2 answers

solution1
1 2020-07-06 18:10:55

solution2
0 2020-07-06 18:17:24

Why does python memory allocation behave like this?

Question

2 answers

solution1 1 2020-07-06 18:10:55

solution2 0 2020-07-06 18:17:24

solution1
1 2020-07-06 18:10:55

solution2
0 2020-07-06 18:17:24