简体   繁体   English

Python - 切片迭代的列表转换随着每个切片而增加

[英]Python - List conversion of sliced iterable increases with every slice

I have a generator from read_sql and convert this generator to iterator using itertools.islice.我有一个来自 read_sql 的生成器,并使用 itertools.islice 将此生成器转换为迭代器。 So I convert this generator to iterator in slices using start and stop arguments.因此,我使用 start 和 stop arguments 将此生成器转换为切片中的迭代器。 And this process runs in a loop to convert generator to iterator in 3 slices and convert the iterator to a list.这个过程循环运行,将生成器转换为 3 个切片的迭代器,并将迭代器转换为列表。

First time when it runs -> iterable_slice = list(it.islice(generator_df, 0, 3)) takes 2.99 seconds Second time when it runs -> iterable_slice = list(it.islice(generator_df, 4, 6)) takes 5.3 seconds and with every new loop or next set of slices, list conversion takes more time.第一次运行 -> iterable_slice = list(it.islice(generator_df, 0, 3))需要 2.99 秒 第二次运行 -> iterable_slice = list(it.islice(generator_df, 4, 6))需要 5.3 秒并且对于每个新循环或下一组切片,列表转换需要更多时间。

Why does this happen and where I am making a mistake?为什么会发生这种情况,我在哪里犯了错误? Thoughts please.请思想。 Thank You.谢谢你。

#function to convert generator to slices

def gen_to_itr(generator_df,slice_start,slice_end):
    iterable_slice = list(it.islice(generator_df, slice_start,slice_end))

#main function 
slices = 3
slice_start = 0
slice_end = slices
flg_cnt = 0
while slice_end <= bcnt and flg_cnt <= 1:
    generator_df = pd.read_sql(query2, test_connection_forbankcv_connection, chunksize = 1800)
    first = time.perf_counter()
    iterable_slice = gen_to_itr(generator_df,slice_start,slice_end)
    end = time.perf_counter()
    print(f'Chunk list created in {round(end-first, 2)} second(s)')
    slice_start = slice_start+slices
    .....

it.islice() has to skip over the first slice_start elements of the generator when creating the new iterator. it.islice()在创建新迭代器时必须跳过生成器的第一个slice_start元素。 This takes time proportional to slice_start .这需要与slice_start成正比的时间。

However, I find it hard to believe that skipping each element of a pandas series would take about 1 second.但是,我很难相信跳过 pandas 系列的每个元素需要大约 1 秒。 If the chunksize were smaller than the slice sizes, it might need to do another fetch from the database to get the next chunk.如果块大小小于切片大小,则可能需要从数据库中再次获取以获取下一个块。 But as long as you're in the same chunk, I think it should have the same speed as iterating through a static series.但只要你在同一个块中,我认为它应该具有与遍历 static 系列相同的速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM