When comparing usage of Python Generators vs List for better performance/ optimisation, i read that Generators are faster to create than list but iterating over list is faster than generator. But I coded an example to test it with small and big sample of data and it contradicts with one another.
When I test speed for iterating over generator and list using 1_000_000_000 where the actual generator will have 500,000,000 numbers. I see the result where Generator iteration is faster than list
from time import time
my_generator = (i for i in range(1_000_000_000) if i % 2 == 0)
start = time()
for i in my_generator:
pass
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(1_000_000_000) if i % 2 == 0]
start = time()
for i in my_list:
pass
print("Time for List iteration - ", time() - start)
And the output is:
Time for Generator iteration - 67.49345350265503 Time for List iteration - 89.21837282180786
But if i use small chunk of data 10_000_000 instead of 1_000_000_000 in input, List iteration is faster than Generator.
from time import time
my_generator = (i for i in range(10_000_000) if i % 2 == 0)
start = time()
for i in my_generator:
pass
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(10_000_000) if i % 2 == 0]
start = time()
for i in my_list:
pass
print("Time for list iteration - ", time() - start)
The output is:
Time for Generator iteration - 1.0233261585235596 Time for list iteration - 0.11701655387878418
Why is behaviour happening?
After understanding points made by @gimix and @Dani Mesejo, I found the answer. Indeed list iteration is faster than generator iteration
In case of generator, a generator is called like a function call for each iteration we are also calling reminder operation (modulus)for each iteration as it makes it even slower for each call...Whereas in case of list it is calculated during creation itself and iteration is faster. Thus creation of list might be slower than creation of generator but iteration of list is definitely faster than list
The above code uses time
module which is not reliable!! Now I used timeit for 1_000_000 and for 1_000_000_000 data and in both cases list iteration was faster :
import timeit
mysetup = '''my_generator = (i for i in range(10_000_000) if i % 2 == 0)
'''
mycode = '''
for i in my_generator:
pass
'''
mysetup1 = '''my_list = [i for i in range(10_000_000) if i % 2 == 0]'''
mycode1 = '''
for i in my_list:
pass
'''
print (timeit.timeit(setup = mysetup,
stmt = mycode,
number = 1))
print (timeit.timeit(setup = mysetup1,
stmt = mycode1,
number = 1))
for better understanding of what is the benefit of generators regarding efficiency. suppose that you want to read a file with 10M rows. first you read it with a regular method like below:
from time import time
first_ts = time()
def regular_file_reader(filename):
file_ = open(filename, "r")
data = file_.readlines()
file_.close()
return data
for row in regular_file_reader("sample_file.csv"):
print(row)
global second_time
second_time = time()
break
print(second_time - first_ts)
as you can see after reading first line of the file we break
ed from loop, because that's what generators make difference "just reading first element". for iterating on next ones it may be even inefficient.
def generator_file_reader(filename):
with open(filename, "r") as f:
for row in f:
yield row
for row in generator_file_reader("sample_file.csv"):
print(row)
global second_time
second_time = time()
break
print(second_time - first_ts)
in this case as generator just read first line not the whole file, using generator is way more faster.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.