简体   繁体   English

提高使用 for 循环的代码的性能

[英]Improve the perfomance of a code that uses for-loops

I am trying to create a list based on some data, but the code I am using is very slow when I run it on large data.我正在尝试根据一些数据创建一个列表,但是当我在大数据上运行它时,我使用的代码非常慢。 So I suspect I am not using all of the Python power for this task.所以我怀疑我没有使用所有的 Python 功能来完成这项任务。 Is there a more efficient and faster way of doing this in Python?在 Python 中是否有更高效、更快捷的方法?

Here an explanantion of the code:这里是代码的解释:

You can think of this problem as a list of games (list_type) each with a list of participating teams and the scores for each team in the game (list_xx).For each of the pairs in the current game it first calculate the sum of the differences in score from the previous competitions (win_comp_past_difs);你可以把这个问题想象成一个游戏列表(list_type),每个游戏都有一个参与团队的列表和游戏中每个团队的分数(list_xx)。对于当前游戏中的每一对,它首先计算与之前比赛的得分差异(win_comp_past_difs); including only the pairs in the current game.仅包括当前游戏中的对子。 Then it update each pair in the current game with the difference in scores.然后它用分数差异更新当前游戏中的每一对。 Using a defaultdict keeps track of the scores for each pair in each game and update this score as each game is played.使用 defaultdict 跟踪每场比赛中每一对的分数,并在每场比赛进行时更新此分数。

In the example below, based on some data, there are for-loops used to create a new variable list_zz .在下面的示例中,基于一些数据,有用于创建新变量list_zz for 循环。

The data and the for-loop code:数据和for循环代码:

import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import permutations

list_type = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'], ['C', 'A', 'B'], ['A'], ['B', 'C']]

list_xx = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0], [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]

list_zz= []
#for-loop
wd = defaultdict(float)
for i, x in zip(list_type, list_xx):
    # staff 1
    if len(i) == 1:
        #print('NaN')
        list_zz.append(np.nan)
        continue
    # Pairs and difference generator for current game (i)
    pairs = list(permutations(i, 2))
    dgen = (value[0] - value[1] for value in permutations(x, 2))
    # Sum of differences from previous games incluiding only pair of teams in the current game
    for team, result in zip(i, x):
        win_comp_past_difs = sum(wd[key] for key in pairs if key[0] == team)
        #print(win_comp_past_difs)
        list_zz.append(win_comp_past_difs)
    # Update pair differences for current game
    for pair, diff in zip(pairs, dgen):
        wd[pair] += diff
print(list_zz)

Which looks like this:看起来像这样:

[0.0,
 0.0,
 nan,
 -4.0,
 4.0,
 0.0,
 0.0,
 0.0,
 nan,
 -10.0,
 13.0,
 -3.0,
 nan,
 3.0,
 -3.0,
 -6.0,
 6.0,
 -10.0,
 -10.0,
 20.0,
 nan,
 14.0,
 -14.0]

If you could elaborate on the code to make it more efficient and execute faster, I would really appreciate it.如果您能详细说明代码以使其更高效且执行速度更快,我将不胜感激。

Without reviewing the overall design of your code, one improvement pops out at me: move your code to a function.无需审查您的代码的整体设计,我就会发现一项改进:将您的代码移动到一个函数中。

As currently written, all of the variables you use are global variables.正如目前所写,您使用的所有变量都是全局变量。 Due to the dynamic nature of the global namespace, Python must look up each global variable you use each and every time you use access it.由于全局命名空间的动态特性,Python 必须在每次使用访问它时查找您使用的每个全局变量。 (1) In CPython, this corresponds to a hash table lookup, which can be expensive, particularly if hash collisions are present. (1)在 CPython 中,这对应于哈希表查找,这可能很昂贵,尤其是在存在哈希冲突的情况下。

In contrast, local variables can be known at compile time, and so are stored in a fixed-size array.相反,局部变量在编译时是已知的,因此存储在固定大小的数组中。 Accessing these variables therefore only involves dereferencing a pointer, which is comparatively much faster.因此,访问这些变量只涉及取消对指针的引用,这相对要快得多。

With this principal in mind, you should be able to boost your performance (somewhere around a 40% drop in run time) by moving all you your code into a "main" function:考虑到这一原则,您应该能够通过将所有代码移动到“主”函数中来提高性能(运行时间减少约 40%):

def main():
    ...
    # Your code here

if __name__ == '__main__':
    main()

(1) Source (1) 来源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM