简体   繁体   中英

Speed comparision for iterating over List and Generator in Python

When comparing usage of Python Generators vs List for better performance/ optimisation, i read that Generators are faster to create than list but iterating over list is faster than generator. But I coded an example to test it with small and big sample of data and it contradicts with one another.

When I test speed for iterating over generator and list using 1_000_000_000 where the actual generator will have 500,000,000 numbers. I see the result where Generator iteration is faster than list

from time import time

my_generator = (i for i in range(1_000_000_000) if i % 2 == 0)

start = time()
for i in my_generator:
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(1_000_000_000) if i % 2 == 0]

start = time()
for i in my_list:
print("Time for List iteration - ", time() - start)

And the output is:

 Time for Generator iteration - 67.49345350265503 Time for List iteration - 89.21837282180786

But if i use small chunk of data 10_000_000 instead of 1_000_000_000 in input, List iteration is faster than Generator.

from time import time

my_generator = (i for i in range(10_000_000) if i % 2 == 0)

start = time()
for i in my_generator:
print("Time for Generator iteration - ", time() - start)

my_list = [i for i in range(10_000_000) if i % 2 == 0]

start = time()
for i in my_list:
print("Time for list iteration - ", time() - start)

The output is:

 Time for Generator iteration - 1.0233261585235596 Time for list iteration - 0.11701655387878418

Why is behaviour happening?

After understanding points made by @gimix and @Dani Mesejo, I found the answer. Indeed list iteration is faster than generator iteration

In case of generator, a generator is called like a function call for each iteration we are also calling reminder operation (modulus)for each iteration as it makes it even slower for each call...Whereas in case of list it is calculated during creation itself and iteration is faster. Thus creation of list might be slower than creation of generator but iteration of list is definitely faster than list

The above code uses time module which is not reliable!! Now I used timeit for 1_000_000 and for 1_000_000_000 data and in both cases list iteration was faster :

import timeit

mysetup = '''my_generator = (i for i in range(10_000_000) if i % 2 == 0)

mycode = '''
for i in my_generator:

mysetup1 = '''my_list = [i for i in range(10_000_000) if i % 2 == 0]'''

mycode1 = '''
for i in my_list:
print (timeit.timeit(setup = mysetup,
                    stmt = mycode,
                     number = 1))
print (timeit.timeit(setup = mysetup1,
                    stmt = mycode1,
                     number = 1))

for better understanding of what is the benefit of generators regarding efficiency. suppose that you want to read a file with 10M rows. first you read it with a regular method like below:

from time import time

first_ts = time()

def regular_file_reader(filename):
    file_ = open(filename, "r")
    data = file_.readlines()
    return data

for row in regular_file_reader("sample_file.csv"):
    global second_time
    second_time = time()
print(second_time - first_ts)

as you can see after reading first line of the file we break ed from loop, because that's what generators make difference "just reading first element". for iterating on next ones it may be even inefficient.

def generator_file_reader(filename):
    with open(filename, "r") as f:
        for row in f:
            yield row

for row in generator_file_reader("sample_file.csv"):
    global second_time
    second_time = time()

print(second_time - first_ts)

in this case as generator just read first line not the whole file, using generator is way more faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM