简体   繁体   English

Python - 在数据框中运行 for 循环的更快方法

[英]Python - faster way to run a for loop in a dataframe

I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:我正在运行以下代码来为每个数据框行计算前几行中的正天数以及股票击败标准普尔 500 指数的天数:

        for offset in [1,5,15,30,45,60,75,90,120,150,
                       200,250,500,750,1000,1250,1500]:
            asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
            merged_data = pd.merge(asset, sp_500, on='Date')
            total_positive_days=0
            total_beating_sp_days=0
            for index, row in merged_data.iterrows():
                print(offset, index)
                for i in range(0,offset):
                    if index-i-1>0:
                        if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
                            total_positive_days+=1
                        if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
                            total_beating_sp_days+=1

but it is quite slow.但它很慢。 Is there a way to speed it up (possibly by somehow getting rid of the for loop)?有没有办法加快速度(可能通过某种方式摆脱 for 循环)?

My dataset looks like this (merged_data follows):我的数据集如下所示(合并数据如下):

Date     Open_x     High_x      Low_x    Close_x  Adj Close_x   Volume_x  return_stock  Pct_positive_1  Pct_beating_1  Pct_change_1  Pct_change_plus_1  Pct_positive_5  Pct_beating_5  Pct_change_5  Pct_change_plus_5  Pct_positive_15  Pct_beating_15  Pct_change_15  Pct_change_plus_15  Pct_positive_30  Pct_beating_30  Pct_change_30  Pct_change_plus_30       Open_y       High_y        Low_y      Close_y  Adj Close_y    Volume_y  return_sp
0  2010-01-04  30.490000  30.642857  30.340000  30.572857    26.601469  123432400           NaN          1311.0         1261.0           NaN          -0.001726          1310.4         1260.8           NaN           0.018562           1307.2          1257.6            NaN            0.039186      1302.066667     1252.633333            NaN            0.056579  1116.560059  1133.869995  1116.560059  1132.989990  1132.989990  3991400000   0.016043
1  2010-01-05  30.657143  30.798571  30.464285  30.625713    26.647457  150476200      0.001729          1311.0         1261.0      0.001729           0.016163          1310.4         1260.8           NaN           0.032062           1307.2          1257.6            NaN            0.031268      1302.066667     1252.633333            NaN            0.056423  1132.660034  1136.630005  1129.660034  1136.520020  1136.520020  2491020000   0.003116
2  2010-01-06  30.625713  30.747143  30.107143  30.138571    26.223597  138040000     -0.015906          1311.0         1261.0     -0.015906           0.001852          1310.4         1260.8           NaN           0.001519           1307.2          1257.6            NaN            0.058608      1302.066667     1252.633333            NaN            0.046115  1135.709961  1139.189941  1133.949951  1137.140015  1137.140015  4972660000   0.000546
3  2010-01-07  30.250000  30.285715  29.864286  30.082857    26.175119  119282800     -0.001849          1311.0         1261.0     -0.001849          -0.006604          1310.4         1260.8           NaN           0.005491           1307.2          1257.6            NaN            0.096428      1302.066667     1252.633333            NaN            0.050694  1136.270020  1142.459961  1131.319946  1141.689941  1141.689941  5270680000   0.004001
4  2010-01-08  30.042856  30.285715  29.865715  30.282858    26.349140  111902700      0.006648          1311.0         1261.0      0.006648           0.008900          1310.4         1260.8           NaN           0.029379           1307.2          1257.6            NaN            0.088584      1302.066667     1252.633333            NaN            0.075713  1140.520020  1145.390015  1136.219971  1144.979980  1144.979980  4389590000   0.002882

asset follows:资产如下:

         Date       Open       High        Low      Close  Adj Close     Volume  return_stock  Pct_positive_1  Pct_beating_1  Pct_change_1  Pct_change_plus_1  Pct_positive_5  Pct_beating_5  Pct_change_5  Pct_change_plus_5
0  2010-01-04  30.490000  30.642857  30.340000  30.572857  26.601469  123432400           NaN          1311.0         1261.0           NaN          -0.001726          1310.4         1260.8           NaN           0.018562
1  2010-01-05  30.657143  30.798571  30.464285  30.625713  26.647457  150476200      0.001729          1311.0         1261.0      0.001729           0.016163          1310.4         1260.8           NaN           0.032062
2  2010-01-06  30.625713  30.747143  30.107143  30.138571  26.223597  138040000     -0.015906          1311.0         1261.0     -0.015906           0.001852          1310.4         1260.8           NaN           0.001519
3  2010-01-07  30.250000  30.285715  29.864286  30.082857  26.175119  119282800     -0.001849          1311.0         1261.0     -0.001849          -0.006604          1310.4         1260.8           NaN           0.005491
4  2010-01-08  30.042856  30.285715  29.865715  30.282858  26.349140  111902700      0.006648          1311.0         1261.0      0.006648           0.008900          1310.4         1260.8           NaN           0.029379

sp_500 follows: sp_500 如下:

         Date         Open         High          Low        Close    Adj Close      Volume  return_sp
0  1999-12-31  1464.469971  1472.420044  1458.189941  1469.250000  1469.250000   374050000        NaN
1  2000-01-03  1469.250000  1478.000000  1438.359985  1455.219971  1455.219971   931800000  -0.009549
2  2000-01-04  1455.219971  1455.219971  1397.430054  1399.420044  1399.420044  1009000000  -0.038345
3  2000-01-05  1399.420044  1413.270020  1377.680054  1402.109985  1402.109985  1085500000   0.001922
4  2000-01-06  1402.109985  1411.900024  1392.099976  1403.449951  1403.449951  1092300000   0.000956

This is a partial answer.这是部分答案。

I think the way you do我认为你做的方式

asset.Close - asset.Close.shift(1)

at the top is key to how you might do this.顶部是您如何执行此操作的关键。 Instead of代替

if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']

create a column with the change in Close_x:使用 Close_x 中的更改创建一个列:

merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)

Similarly,相似地,

if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']

becomes成为

merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)

Then you can iterate i and use subsets like然后你可以迭代 i 并使用像这样的子集

merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]

There are a lot of additional details to work out, but I hope this gets you started.还有很多额外的细节需要解决,但我希望这能让你开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM