python - 巨大的 for 循环很慢，有没有更快的方法来处理它

Question

I'm running a lots of data we speak about 10-15 mil.我正在运行大量数据，我们所说的大约为 10-15 百万。 multiply times in for loop, and i can found out of its my for loop there are slow eg.在 for 循环中乘以次数，我可以从我的 for 循环中发现它很慢，例如。

i disover the problem when i test 100 rows, then 1000 and then 25.000 rows every time i encrse my numbers of rows its taken much longer time to run.当我测试 100 行，然后是 1000 行，然后是 25.000 行时，我发现了这个问题，每次我增加我的行数时，它的运行时间要长得多。

i'm using numpy today to calculate a lots of prices and its working perfect, so now i hit the wall with so much data there shut running multi time to restructuer the data befure i'm ready to return it to a final array/dict.我今天正在使用 numpy 来计算很多价格并且它的工作很完美，所以现在我用这么多数据撞到了墙上，关闭了多次运行以重构数据，然后我准备将它返回到最终的数组/字典.

a orther issue its when i run a for in a for loop, its taken much more time then a for ofc i need to run "for 1 rows * for 2 rows"另一个问题是，当我在 for 循环中运行 for 时，它花费的时间比 for ofc 多得多，我需要运行“for 1 rows * for 2 rows”

my case is.我的情况是。

1.200.000 products 1.200.000 个产品
1. price rules there can be hit (25-30 diff price rules)价格规则可以被击中（25-30 diff 价格规则）
2. select the right price rule and do stuff based on htis rule选择正确的价格规则并根据 htis 规则做事
3. restruter the data using for loop before using numpy在使用 numpy 之前使用 for 循环重新定义数据
4. calculate a price to the main product price by using numpy使用 numpy 计算主要产品价格的价格
run the product pricerule group agenst 10-15 price groups for each product为每个产品运行产品价格规则组代理 10-15 个价格组
1. restructer the data and prepare in a for loop before using numpy在使用 numpy 之前重构数据并在 for 循环中准备
2. append the price to main product price array in for loop将价格附加到 for 循环中的主要产品价格数组
restruter all data and prepare to return it in my array/dict so i can use it later (single / multi product calc)重新调整所有数据并准备将其返回到我的数组/字典中，以便我稍后使用它（单/多产品计算）

hope you understand what i want, and why this will take so much time, and hope there are one there can help me to find a faster way to calculate so much data in.希望你明白我想要什么，为什么这会花费这么多时间，希望有一个可以帮助我找到一种更快的方法来计算这么多数据。

i have thinking on multitread option, but i think i need to fix the main-for-loop issue before i go to the next performance setup我在考虑 multitread 选项，但我认为我需要在进入下一个性能设置之前修复 main-for-loop 问题

a very basic sample of my case how to hit this loop in a loop hell我的案例的一个非常基本的示例如何在循环地狱中命中这个循环

import datetime
start_time = datetime.datetime.now()

product = []
group = []
final_collect = []

for test_product in range(25000):
    product.append({'title': test_product})

for (inx, item) in enumerate(product):
    group.append({
        'product' : item,
        'group-data' : []
    })

    for test_group in range(10):
        group[inx]['group-data'].append({'group' : test_group, 'price' : 100.0})
        print(inx, test_group)

as you can see this one will take around 2-3 sec to run 250.000 loops, when we speak 25.000 products, if we go to run it on 1.200.000 mil.正如您所看到的，如果我们在 1.200.000 百万上运行它，当我们说 25.000 个产品时，这个循环将需要大约 2-3 秒才能运行 250.000 个循环。 * 10 groups its 12.000.000 loops each and i do it multiply time so its will take long time. * 10 组，每组 12.000.000 次循环，我将其乘以时间，因此需要很长时间。 but there shut be a faster way around this issue?但是有没有更快的方法解决这个问题？

Answer 1

run2 below generates a 30% improvement versus equivalent run1 (which you provided).下面的run2与等效的run1 （您提供的）相比提高了 30%。 Output is identical.输出相同。

While this may not be "plug and play" for your use case, it demonstrates some of the tricks you can use to improve performance.虽然这对于您的用例来说可能不是“即插即用”，但它展示了一些您可以用来提高性能的技巧。

import datetime

def run1(n):
    start_time = datetime.datetime.now()

    product = []
    group = []
    final_collect = []

    for test_product in range(n):
        product.append({'title': test_product})

    for (inx, item) in enumerate(product):
        group.append({'product': item,
                      'group-data': []})

        for test_group in range(10):
            group[inx]['group-data'].append({'group': test_group, 'price': 100.0})

    return group

def run2(n):
    start_time = datetime.datetime.now()

    group = [{'product': {'title': i},
              'group-data': [{'group': test_group, 'price': 100.0} for test_group in range(10)]} \
              for i in range(n)]

    return group

assert run1(10) == run2(10)

%timeit run1(50000)  # 1 loop, best of 3: 372 ms per loop
%timeit run2(50000)  # 1 loop, best of 3: 260 ms per loop

python - 巨大的 for 循环很慢，有没有更快的方法来处理它

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-15 10:21:17

python - 巨大的 for 循环很慢，有没有更快的方法来处理它

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-15 10:21:17

解决方案1
1 已采纳 2018-02-15 10:21:17