简体   繁体   中英

Garbage collection in Python for Linux

I'm a little puzzled how Python allocates memory and garbage-collects, and how that is platform-specific. For example, When we compare the following two code snippets:

Snippet A:

>>> id('x' * 10000000) == id('x' * 10000000)
True

Snippet B:

>>> x = "x"*10000000
>>> y = "x"*10000000
>>> id(x) == id(y)
False

Snippet A returns true because when Python allocates memory, it allocates it in the same location for the first test, and in different locations in the second test, which is why their memory locations are different.

But apparently system performance or platform impacts this, because when I try this on a larger scale:

for i in xrange(1, 1000000000):
    if id('x' * i) != id('x' * i):
        print i
        break

A friend on a Mac tried this, and it ran until the end. When I ran it on a bunch of Linux VMs, it would invariably return (but at different times) on different VMs. Is this because of the scheduling of the garbage collection in Python? Was it because my Linux VMs had less processing speed than the Mac, or does the Linux Python implementation garbage-collect differently?

The garbage collector just uses whatever space is convenient. There are lots of different garbage collection strategies, and things are also affected by paramters, different platforms, memory usage, phase of the moon etc. Trying to guess how the interpreter will happen to allocate particular objects is just a waste of time.

It happens because python caches small integers and strings :

large strings : stored in variables not cached:

In [32]: x = "x"*10000000

In [33]: y = "x"*10000000

In [34]: x is y
Out[34]: False

large strings : not stored in variables, looks like cached:

In [35]: id('x' * 10000000) == id('x' * 10000000)
Out[35]: True

small strings : cached

In [36]: x="abcd"

In [37]: y="abcd"

In [38]: x is y
Out[38]: True

small integers: Cached

In [39]: x=3

In [40]: y=3

In [41]: x is y
Out[41]: True

large integers:

stored in variables: not cached

In [49]: x=12345678

In [50]: y=12345678

In [51]: x is y
Out[51]: False

not stored: cached

In [52]: id(12345678)==id(12345678)
Out[52]: True

CPython uses two strategies for memory management:

  1. Reference Counting
  2. Mark-and-Sweep Garbage Collection

Allocation is in general done via the platforms malloc/free functions and inherits the performance characteristics of the underlaying runtime. If memory is reused is decided by the operating system. (There are some objects, which are pooled by the python vm)

Your example does, however, not trigger the 'real' GC algorithm (this is only used to collect cycles). Your long string gets deallocated as soon as the last reference is dropped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM