简体   繁体   English

如何使我的脚本在python上更快?

[英]How to make faster my script on python?

I have a script in python but it takes more than 20 hours to run until the end. 我在python中有一个脚本,但运行到结束需要20多个小时。

Since my code is pretty big, I will post a simplified one. 由于我的代码很大,因此我将发布一个简化的代码。

The first part of the code: 代码的第一部分:

flag = 1
mydic = {}
for i in mylist:
    mydic[flag] = myfunction(i)
    flag += 1

mylist has more than 700 entries and each time I call myfunction it run for around 20sec. mylist有700多个条目,每次调用myfunction它运行大约20秒。

So, I was thinking if I can use paraller programming to split the iteration into two groups and run it simultaneously. 因此,我在考虑是否可以使用并行编程将迭代分为两组并同时运行。 Is that possible and will I need the half time than before? 那有可能吗?我需要比以前少一半的时间吗?

The second part of the code: 代码的第二部分:

mymatrix = []
for n1 in range(0,flag):
    mat = []
    for n2 in range(0,flag):
        if n1 >= n2:
            mat.append(0)
        else:
            res = myfunction2(mydic(n1),mydic(n2))
            mat.append(res)
    mymatrix.append(mat)

So, if mylist has 700 entries, I want to create a 700x700 matrix where it is upper triangular matrix. 因此,如果mylist有700个条目,我想创建一个700x700矩阵,它是上三角矩阵。 But the myfunction2() needs around 30sec each time. 但是myfunction2()每次大约需要30秒。 I don't know if I can use parallel programming here too. 我不知道我是否也可以在这里使用并行编程。

I cannot simplify the myfunction() and myfunction2() since they are functions where I call an external api and return the results. 我无法简化myfunction()myfunction2()因为它们是我调用外部api并返回结果的函数。

Do you have any suggestion of how can I change it to make it faster. 您对我如何更改它以加快速度有任何建议。

Based on your comments, I think it's very likely that the 30seconds of time is mostly due to external API calls. 根据您的评论,我认为30秒的时间很可能主要是由于外部API调用所致。 I would add some timing code to test what portions of your code are actually responsible for the slowness. 我将添加一些计时代码,以测试代码的哪些部分实际上是造成速度缓慢的原因。

If it is from the external API calls, there are some easy fixes. 如果来自外部API调用,则有一些简单的修复程序。 The external API calls block, so you'll get a speedup if you can move to a parallel model ( though 30s of blocking sounds huge to me ). 外部API调用了block,因此,如果您可以迁移到并行模型,则可以加快速度(尽管30s的阻塞对我来说听起来很重要)。

I think it would be easiest to create a quick "task list" by having the output of 2 loops be a matrix of arguments to pass into a function. 我认为通过将2个循环的输出作为传递给函数的参数矩阵来创建快速的“任务列表”是最容易的。 Then I'd pipe them into Celery to run the tasks. 然后,我将它们传送到Celery以运行任务。 That should give you a decent speedup with a minimal amount of work. 那应该以最少的工作量就可以使您获得不错的加速。

You would probably save a lot more time with the threading or multiprocessing modules to run tasks (or sections) , or even write it all in Twisted python - but that usually takes longer than a simple celery function. 使用threading或多multiprocessing模块运行任务(或部分),甚至用Twisted python编写所有代码,您可能会节省更多时间-但这通常比简单的celery函数花费更多的时间。

The one caveat with the Celery approach is that you'll be dispatching a lot of work - so you'll have to have some functionality to poll for results. Celery方法的一个警告是,您将分派大量工作-因此,您必须具有一些功能以轮询结果。 That could be a while loop that just sleeps(10) and repeats itself until celery has a result for every task. 那可能是一个while循环,它会sleeps(10)并重复进行直到芹菜得到每项任务的结果。 If you do it in Twisted , you can access/track results on finish. 如果您在Twisted执行此操作,则可以最终访问/跟踪结果。 I've never had to do something like this with multiprocessing, so don't know how that would fit in. 我从来不需要在多处理过程中做这样的事情,所以不知道如何适应。

how about using a generator for the second part instead of one of the for loops 在第二部分而不是for循环中使用生成器怎么样

def fn():
    for n1 in range(0, flag):
        yield n1

generate = fn()

while True:
    a = next(generate)
    for n2 in range(0, flag):
        if a >= n2:
            mat.append(0)
        else:
            mat.append(myfunction2(mydic(a),mydic(n2))
            mymatrix.append(mat)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM