提高代码性能并摆脱循环

Question

I have written a code that works by using standard for loop over a dataframe. 我编写了一个代码，该代码通过使用标准的数据帧循环来工作。 I want to check if performance can be improved by making the code more pythonic by using groupby, apply, lamda, etc. 我想检查性能是否可以通过使用groupby，apply，lamda等使代码更具pythonic来提高。

The code is designed to check for a particular data pattern on the price data of a stock. 该代码旨在检查股票价格数据上的特定数据模式。 Close price of the stock is used along with its 20EMA. 股票的收盘价与20EMA一起使用。 The panda dataframe is loaded from mysql db. 熊猫数据帧是从mysql数据库加载的。 It has the below structure. 它具有以下结构。

df[['eod_dt','bull_bear','open','high','low','close','ema20']]

For the bullish case, 对于看涨的情况，

Condition1: the 'low' has to be above 'ema20' at least once. 条件1： “最低”必须至少高于“ ema20”一次。

Condition2: If this case is true then there needs to be at least 2 'bull' candles following the event. Condition2：如果这种情况属实，那么在事件之后至少需要2支“牛”蜡烛。 The first 'bull' candle should be followed by a candle whose 'high' is above the 'high' of the first 'bull' candle. 第一个“公牛”蜡烛之后应是“高”高于第一个“公牛”蜡烛的“高”蜡烛。 The second bull candle will be my buy signal. 第二个牛市蜡烛将是我的买入信号。

Currently I have done it like this. 目前，我已经这样做了。

df_bull=df[(df['bull_bear']=='bull') & (df['high']<df['EMA20']) & (df['eod_dt']>start_dt)] #start_dt is start of analysis period

for index,row in df_bull.iterrows():

    df_temp1=df[(df['index']>(row['index']-10)) & (df['index']<row['index'])] #10 day look back, the index field actually exists and acts as a proxy to actual trading days
    df_temp2=df_temp1[df_temp1['low']>df_temp1['EMA20']]

    if not df_temp2.empty: #condition1 satisfied
        df_temp1['high_1']=df_temp1['high'].shift(-1)
        df_temp2=df_temp1[(df_temp1['bull_bear']=='bull') & (df_temp1['high']<df_temp1['EMA20']) & (df_temp1['high_1']>df_temp1['high'])]

            if not df_temp2.empty and len(df_temp2)<4:
                #entry above signal bar high

The result of the 'better' code should be the same as above. “更好”的代码的结果应与上述相同。 But I will like to get the run time to a minimum. 但是我希望将运行时间减至最少。 A shorter code will also be good. 较短的代码也可以。

Answer 1

To get rid of the for loop you can use pandas.DataFrame.apply 要摆脱for循环，可以使用pandas.DataFrame.apply

Apply allows you to perform operations on all the rows of the dataset. 应用允许您对数据集的所有行执行操作。

It works as follow (simple example) : 它的工作方式如下（一个简单的示例）：

def transform(row):
    # put the code that process each row here

result = df1.apply(transform, axis=1)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

In your case : 在您的情况下：

df_bull=df[(df['bull_bear']=='bull') & (df['high']<df['EMA20']) & (df['eod_dt']>start_dt)]

def transform(row):
    row_temp1=row[(df['index']>(row['index']-10)) & (df['index']<row['index'])]

    df_temp2=df_temp1[df_temp1['low']>df_temp1['EMA20']] 
    if not df_temp2.empty: #condition1 satisfied 
        df_temp1['high_1']=df_temp1['high'].shift(-1)
        df_temp2=df_temp1[(df_temp1['bull_bear']=='bull') & (df_temp1['high']<df_temp1['EMA20']) & (df_temp1['high_1']>df_temp1['high'])] 
        if not df_temp2.empty and len(df_temp2)<4: 
            #entry above signal bar high


df['result'] = df_bull.apply(transform, axis=1)

I am on mobile it is hard for me to write this post. 我在移动设备上，很难写这篇文章。 Be gentle 要温柔

提高代码性能并摆脱循环

问题描述

1 个解决方案

解决方案1
0 2019-08-09 06:59:28

提高代码性能并摆脱循环

问题描述

1 个解决方案

解决方案1 0 2019-08-09 06:59:28

解决方案1
0 2019-08-09 06:59:28