简体   繁体   English

迭代速度比for循环快

[英]Faster iteration than for loop

I have a for loop for a new project, but this form of the code is too slow. 我有一个用于新项目的for循环,但是这种形式的代码太慢了。 I am trying to find the fastest way to resolve it. 我正在尝试找到最快的方法来解决它。 Maybe as a vector? 也许作为向量?

I have tried def approach, but it did not execute properly. 我尝试了def方法,但是执行不正确。

%%time
for x in df2.index:   
    if x > 0: 
        if (
            (df2.loc[x,'DEF RANK'] == df2.loc[x,'OFF RANK']) 
            & (df2.loc[x,'W']=='nan')
            & (pd.isnull(df2.loc[(x-1),'Event2']) == False)
            & ((df2.loc[(x-1),'Event2'] == 'nan') == False)
        ):
            df2.loc[x,'W'] = df2.loc[(x-1),'W']
        else: # if the above isn't true - pass
            pass
    else: 
        pass

Wall time: 6.76 ms 挂墙时间:6.76毫秒

In Python, the bitwise & operator does not short circuit. 在Python中,按位&运算符不会短路。 Meaning that all of your comparisons are happening every time, regardless of what the statements before them evaluated to. 这意味着您的所有比较每次都在发生,而不管它们之前的语句所评估的是什么。 Try this for a demonstration: 试试看这个示范:

bool(print('a')) & bool(print('b')) & bool(print('c'))

Outputs: 输出:

a
b
c

Compare that to the and logical operator, which does short circuit the chain of comparisons: 将其与and逻辑运算符进行比较,这会缩短比较链:

bool(print('a')) and bool(print('b')) and bool(print('c'))

Outputs: 输出:

a

Try subbing out your & s with and s to limit the number of comparisons being done. 尝试用and减去& ,以限制进行比较的次数。

Once you've done that, you can try fiddling with which comparisons should come first. 完成此操作后,您可以尝试摆弄应该首先进行的比较。 You'll want to order them by either/both which ones are most likely to evaluate to False and which ones are most performant. 您需要按哪种/哪种方式对它们进行排序,哪一种最有可能被评估为False ,哪些才是性能最高的。

The first thing you want to learn when dealing with pandas dataframe is to view the data as a whole, and try to deal with it as a whole. 处理pandas数据框时,您要学习的第一件事是查看数据的整体,并尝试将其整体处理。 So let's dig in your code and see how we can improve it. 因此,让我们深入研究您的代码,看看如何改进它。

for x in df2.index:   
    # the next if means you essentially want to look at 
    # df.index > 0 only
    if x > 0: 
        # this if clause chains several 'and' conditions:
        if (
            # this is df2['DEF RANK'].eq(df2['OFF RANK'])
            (df2.loc[x,'DEF RANK'] == df2.loc[x,'OFF RANK']) 

            # this is df2['W'].eq('nan')
            & (df2.loc[x,'W']=='nan')

            # this is df2.loc[df.index - 1, 'Event2'].notnull()
            & (pd.isnull(df2.loc[(x-1),'Event2']) == False)

            # this is df2.loc[df.index - 1, 'Event2'].ne('nan')
            & ((df2.loc[(x-1),'Event2'] == 'nan') == False)
        ):
            # here you copy some position to other position
            df2.loc[x,'W'] = df2.loc[(x-1),'W']

        # if you don't do anything after else, why not delete it?
        else: # if the above isn't true - pass
            pass
    else: 
        pass

So with all the comments, how do we write codes that runs faster. 因此,使用所有注释,我们如何编写运行速度更快的代码。 Plainly from your code: 显然是从您的代码:

idx_gt0 = (df.index > 0)
rank_filters = df2['DEF RANK'].eq(df2['OFF RANK'])
w_isnan = df2['W'].eq('nan')

# the next two conditions are more challenging:
# we start with looking at the series  df.loc[df.index-1, 'Event2']
df2['event2_shifted'] = df2.loc[df2.index-1, 'Event2'].values

event2_notnull = df2['event2_shifted'].notnull()
event2_notnan = df2['event2_shifted'].ne('nan')

# now we can merge all filters:
filters = (idx_gt0 & rank_filters 
           & w_isnan & event2_notnull & event2_notnan
          ) 

 # last assign:
 df2.loc[filters, 'W'] = df2.loc[df2.index - 1, 'W']

Of course, this is literally translated from your code. 当然,这实际上是从您的代码翻译而来的。 But as you said, your code does not work properly. 但是正如您所说,您的代码无法正常运行。 So it would help if you give a sample input data and its expected output. 因此,如果您提供样本输入数据及其预期输出,将会有所帮助。

Unfortunately, there is no good faster way. 不幸的是,没有更好的更快方法。

Here is one recommendation: change all the & s in your code to and s. 这是一个建议:将代码中的所有&更改为and

It may not help that much, though. 不过,可能没有太大帮助。

You need to use a different programming language, like C, C++, or Java, as compiled code can run faster than interpreted code. 您需要使用其他编程语言,例如C,C ++或Java,因为编译后的代码比解释后的代码运行得更快。 If you are willing to get really frustrated, you can try assembly language, but I'm not sure that it is worth it for just a for loop. 如果您真的很沮丧,可以尝试使用汇编语言,但我不确定仅for循环是否值得。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM