Let's assume that I want to find n**2
for all numbers smaller than 20000000
.
import time, psutil, gc
gc.collect()
mem_before = psutil.virtual_memory()[3]
time1 = time.time()
# (comprehension, generator, function)-code comes here
time2 = time.time()
mem_after = psutil.virtual_memory()[3]
print "Used Mem = ", (mem_after - mem_before)/(1024**2) # convert Byte to Megabyte
print "Calculation time = ", time2 - time1
1. Creating a list of via comprehension:
x = [i**2 for i in range(20000000)]
It is really slow and time consuming:
Used Mem = 1270 # Megabytes
Calculation time = 33.9309999943 # Seconds
2. Creating a generator using '()'
:
x = (i**2 for i in range(20000000))
It is much faster than option 1, but still uses a lot of memory:
Used Mem = 611
Calculation time = 0.278000116348
3. Defining a generator function (most efficient):
def f(n):
i = 0
while i < n:
yield i**2
i += 1
x = f(20000000)
Its consumption:
Used Mem = 0
Calculation time = 0.0
()
creates a generator, so why does it need a lot of memory? As others have pointed out in the comments, range
creates a list
in Python 2. Hence, it is not the generator per se that uses up the memory, but the range
that the generator uses:
x = (i**2 for i in range(20000000)) # builds a 2*10**7 element list, not for the squares , but for the bases >>> sys.getsizeof(range(100)) 872 >>> sys.getsizeof(xrange(100)) 40 >>> sys.getsizeof(range(1000)) 8720 >>> sys.getsizeof(xrange(1000)) 40 >>> sys.getsizeof(range(20000000)) 160000072 >>> sys.getsizeof(xrange(20000000)) 40
This also explains why your second version (the generator expression) uses around half the memory of the first version (the list comprehension) as the first one builds two lists (for the bases and the squares) while the second only builds one list for the bases.
xrange(20000000)
thus, greatly improves memory usage as it returns a lazy iterable. This is essentially the built-in memory efficient way to iterate over a range of numbers that mirrors your third version (with the added flexibility of start
, stop
and step
):
x = (i**2 for i in xrange(20000000))
In Python 3, range
is essentially what xrange
used to be in Python 2. However, the Python 3 range
object has some nice features that Python 2's xrange
doesn't have, like O(1)
slicing, contains, etc.
1.- The object must be created in memory, so in your second solution, the generator is created but not computed , but still has memory, python probably reserve some memory for its computation to be efficient, we don't know about the interpreter magic, also notice that range
funtion creates the full list from 0
to 200000
, so in fact you are still building that list in memory.
2.- You can use itertool.imap :
squares = itertools.imap(lambda x: x**2, xrange(200000))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.