简体   繁体   English

python - 巨大的 for 循环很慢,有没有更快的方法来处理它

[英]python - huge for loop is to slow, are there a faster way to handle it

I'm running a lots of data we speak about 10-15 mil.我正在运行大量数据,我们所说的大约为 10-15 百万。 multiply times in for loop, and i can found out of its my for loop there are slow eg.在 for 循环中乘以次数,我可以从我的 for 循环中发现它很慢,例如。

i disover the problem when i test 100 rows, then 1000 and then 25.000 rows every time i encrse my numbers of rows its taken much longer time to run.当我测试 100 行,然后是 1000 行,然后是 25.000 行时,我发现了这个问题,每次我增加我的行数时,它的运行时间要长得多。

i'm using numpy today to calculate a lots of prices and its working perfect, so now i hit the wall with so much data there shut running multi time to restructuer the data befure i'm ready to return it to a final array/dict.我今天正在使用 numpy 来计算很多价格并且它的工作很完美,所以现在我用这么多数据撞到了墙上,关闭了多次运行以重构数据,然后我准备将它返回到最终的数组/字典.

a orther issue its when i run a for in a for loop, its taken much more time then a for ofc i need to run "for 1 rows * for 2 rows"另一个问题是,当我在 for 循环中运行 for 时,它花费的时间比 for ofc 多得多,我需要运行“for 1 rows * for 2 rows”

my case is.我的情况是。

  1. 1.200.000 products 1.200.000 个产品
    1. price rules there can be hit (25-30 diff price rules)价格规则可以被击中(25-30 diff 价格规则)
    2. select the right price rule and do stuff based on htis rule选择正确的价格规则并根据 htis 规则做事
    3. restruter the data using for loop before using numpy在使用 numpy 之前使用 for 循环重新定义数据
    4. calculate a price to the main product price by using numpy使用 numpy 计算主要产品价格的价格
  2. run the product pricerule group agenst 10-15 price groups for each product为每个产品运行产品价格规则组代理 10-15 个价格组
    1. restructer the data and prepare in a for loop before using numpy在使用 numpy 之前重构数据并在 for 循环中准备
    2. append the price to main product price array in for loop将价格附加到 for 循环中的主要产品价格数组
  3. restruter all data and prepare to return it in my array/dict so i can use it later (single / multi product calc)重新调整所有数据并准备将其返回到我的数组/字典中,以便我稍后使用它(单/多产品计算)

hope you understand what i want, and why this will take so much time, and hope there are one there can help me to find a faster way to calculate so much data in.希望你明白我想要什么,为什么这会花费这么多时间,希望有一个可以帮助我找到一种更快的方法来计算这么多数据。

i have thinking on multitread option, but i think i need to fix the main-for-loop issue before i go to the next performance setup我在考虑 multitread 选项,但我认为我需要在进入下一个性能设置之前修复 main-for-loop 问题

a very basic sample of my case how to hit this loop in a loop hell我的案例的一个非常基本的示例如何在循环地狱中命中这个循环

import datetime
start_time = datetime.datetime.now()

product = []
group = []
final_collect = []

for test_product in range(25000):
    product.append({'title': test_product})

for (inx, item) in enumerate(product):
    group.append({
        'product' : item,
        'group-data' : []
    })

    for test_group in range(10):
        group[inx]['group-data'].append({'group' : test_group, 'price' : 100.0})
        print(inx, test_group)

as you can see this one will take around 2-3 sec to run 250.000 loops, when we speak 25.000 products, if we go to run it on 1.200.000 mil.正如您所看到的,如果我们在 1.200.000 百万上运行它,当我们说 25.000 个产品时,这个循环将需要大约 2-3 秒才能运行 250.000 个循环。 * 10 groups its 12.000.000 loops each and i do it multiply time so its will take long time. * 10 组,每组 12.000.000 次循环,我将其乘以时间,因此需要很长时间。 but there shut be a faster way around this issue?但是有没有更快的方法解决这个问题?

run2 below generates a 30% improvement versus equivalent run1 (which you provided).下面的run2与等效的run1 (您提供的)相比提高了 30%。 Output is identical.输出相同。

While this may not be "plug and play" for your use case, it demonstrates some of the tricks you can use to improve performance.虽然这对于您的用例来说可能不是“即插即用”,但它展示了一些您可以用来提高性能的技巧。

import datetime

def run1(n):
    start_time = datetime.datetime.now()

    product = []
    group = []
    final_collect = []

    for test_product in range(n):
        product.append({'title': test_product})

    for (inx, item) in enumerate(product):
        group.append({'product': item,
                      'group-data': []})

        for test_group in range(10):
            group[inx]['group-data'].append({'group': test_group, 'price': 100.0})

    return group

def run2(n):
    start_time = datetime.datetime.now()

    group = [{'product': {'title': i},
              'group-data': [{'group': test_group, 'price': 100.0} for test_group in range(10)]} \
              for i in range(n)]

    return group

assert run1(10) == run2(10)

%timeit run1(50000)  # 1 loop, best of 3: 372 ms per loop
%timeit run2(50000)  # 1 loop, best of 3: 260 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用python处理时间字符串的更快方法 - faster way to handle time string with python 有没有更快的方法来预处理 Python 中的大量文本数据? - Is there a faster way to preprocess huge amount of text data in Python? 递归公式在循环中很慢,有没有办法让这段代码运行得更快? - Recursive formula is slow with a loop, is there a way to make this code run faster? 有没有更快的方法来在python中进行while循环? - Is there a faster way to do a while loop in python? 在 Python 中循环遍历帧像素的更快方法? - Faster way to loop through the pixel of a frame in Python? Python:是否有更快的方法在 for 循环中过滤 dataframe - Python: Is there a faster way to filter on dataframe in a for loop Python - 在数据框中运行 for 循环的更快方法 - Python - faster way to run a for loop in a dataframe Python(Numpy)数组相等函数很慢……有没有更快的方法? - Python (Numpy) array equal function is slow…is there a faster way? Python使用for循环来做groupby是太慢,更快的方法吗? - Python using for loops to do a groupby is too slow, faster way? python:urlopen&threading不必要地很慢? 有没有更快的方法? - python: urlopen&threading unnecessarily slow? is there a faster way?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM