[英]python - huge for loop is to slow, are there a faster way to handle it
I'm running a lots of data we speak about 10-15 mil.我正在运行大量数据,我们所说的大约为 10-15 百万。 multiply times in for loop, and i can found out of its my for loop there are slow eg.在 for 循环中乘以次数,我可以从我的 for 循环中发现它很慢,例如。
i disover the problem when i test 100 rows, then 1000 and then 25.000 rows every time i encrse my numbers of rows its taken much longer time to run.当我测试 100 行,然后是 1000 行,然后是 25.000 行时,我发现了这个问题,每次我增加我的行数时,它的运行时间要长得多。
i'm using numpy today to calculate a lots of prices and its working perfect, so now i hit the wall with so much data there shut running multi time to restructuer the data befure i'm ready to return it to a final array/dict.我今天正在使用 numpy 来计算很多价格并且它的工作很完美,所以现在我用这么多数据撞到了墙上,关闭了多次运行以重构数据,然后我准备将它返回到最终的数组/字典.
a orther issue its when i run a for in a for loop, its taken much more time then a for ofc i need to run "for 1 rows * for 2 rows"另一个问题是,当我在 for 循环中运行 for 时,它花费的时间比 for ofc 多得多,我需要运行“for 1 rows * for 2 rows”
my case is.我的情况是。
hope you understand what i want, and why this will take so much time, and hope there are one there can help me to find a faster way to calculate so much data in.希望你明白我想要什么,为什么这会花费这么多时间,希望有一个可以帮助我找到一种更快的方法来计算这么多数据。
i have thinking on multitread option, but i think i need to fix the main-for-loop issue before i go to the next performance setup我在考虑 multitread 选项,但我认为我需要在进入下一个性能设置之前修复 main-for-loop 问题
a very basic sample of my case how to hit this loop in a loop hell我的案例的一个非常基本的示例如何在循环地狱中命中这个循环
import datetime
start_time = datetime.datetime.now()
product = []
group = []
final_collect = []
for test_product in range(25000):
product.append({'title': test_product})
for (inx, item) in enumerate(product):
group.append({
'product' : item,
'group-data' : []
})
for test_group in range(10):
group[inx]['group-data'].append({'group' : test_group, 'price' : 100.0})
print(inx, test_group)
as you can see this one will take around 2-3 sec to run 250.000 loops, when we speak 25.000 products, if we go to run it on 1.200.000 mil.正如您所看到的,如果我们在 1.200.000 百万上运行它,当我们说 25.000 个产品时,这个循环将需要大约 2-3 秒才能运行 250.000 个循环。 * 10 groups its 12.000.000 loops each and i do it multiply time so its will take long time. * 10 组,每组 12.000.000 次循环,我将其乘以时间,因此需要很长时间。 but there shut be a faster way around this issue?但是有没有更快的方法解决这个问题?
run2
below generates a 30% improvement versus equivalent run1
(which you provided).下面的run2
与等效的run1
(您提供的)相比提高了 30%。 Output is identical.输出相同。
While this may not be "plug and play" for your use case, it demonstrates some of the tricks you can use to improve performance.虽然这对于您的用例来说可能不是“即插即用”,但它展示了一些您可以用来提高性能的技巧。
import datetime
def run1(n):
start_time = datetime.datetime.now()
product = []
group = []
final_collect = []
for test_product in range(n):
product.append({'title': test_product})
for (inx, item) in enumerate(product):
group.append({'product': item,
'group-data': []})
for test_group in range(10):
group[inx]['group-data'].append({'group': test_group, 'price': 100.0})
return group
def run2(n):
start_time = datetime.datetime.now()
group = [{'product': {'title': i},
'group-data': [{'group': test_group, 'price': 100.0} for test_group in range(10)]} \
for i in range(n)]
return group
assert run1(10) == run2(10)
%timeit run1(50000) # 1 loop, best of 3: 372 ms per loop
%timeit run2(50000) # 1 loop, best of 3: 260 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.