简体   繁体   中英

Python - faster way to run a for loop in a dataframe

I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:

        for offset in [1,5,15,30,45,60,75,90,120,150,
                       200,250,500,750,1000,1250,1500]:
            asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
            merged_data = pd.merge(asset, sp_500, on='Date')
            total_positive_days=0
            total_beating_sp_days=0
            for index, row in merged_data.iterrows():
                print(offset, index)
                for i in range(0,offset):
                    if index-i-1>0:
                        if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
                            total_positive_days+=1
                        if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
                            total_beating_sp_days+=1

but it is quite slow. Is there a way to speed it up (possibly by somehow getting rid of the for loop)?

My dataset looks like this (merged_data follows):

Date     Open_x     High_x      Low_x    Close_x  Adj Close_x   Volume_x  return_stock  Pct_positive_1  Pct_beating_1  Pct_change_1  Pct_change_plus_1  Pct_positive_5  Pct_beating_5  Pct_change_5  Pct_change_plus_5  Pct_positive_15  Pct_beating_15  Pct_change_15  Pct_change_plus_15  Pct_positive_30  Pct_beating_30  Pct_change_30  Pct_change_plus_30       Open_y       High_y        Low_y      Close_y  Adj Close_y    Volume_y  return_sp
0  2010-01-04  30.490000  30.642857  30.340000  30.572857    26.601469  123432400           NaN          1311.0         1261.0           NaN          -0.001726          1310.4         1260.8           NaN           0.018562           1307.2          1257.6            NaN            0.039186      1302.066667     1252.633333            NaN            0.056579  1116.560059  1133.869995  1116.560059  1132.989990  1132.989990  3991400000   0.016043
1  2010-01-05  30.657143  30.798571  30.464285  30.625713    26.647457  150476200      0.001729          1311.0         1261.0      0.001729           0.016163          1310.4         1260.8           NaN           0.032062           1307.2          1257.6            NaN            0.031268      1302.066667     1252.633333            NaN            0.056423  1132.660034  1136.630005  1129.660034  1136.520020  1136.520020  2491020000   0.003116
2  2010-01-06  30.625713  30.747143  30.107143  30.138571    26.223597  138040000     -0.015906          1311.0         1261.0     -0.015906           0.001852          1310.4         1260.8           NaN           0.001519           1307.2          1257.6            NaN            0.058608      1302.066667     1252.633333            NaN            0.046115  1135.709961  1139.189941  1133.949951  1137.140015  1137.140015  4972660000   0.000546
3  2010-01-07  30.250000  30.285715  29.864286  30.082857    26.175119  119282800     -0.001849          1311.0         1261.0     -0.001849          -0.006604          1310.4         1260.8           NaN           0.005491           1307.2          1257.6            NaN            0.096428      1302.066667     1252.633333            NaN            0.050694  1136.270020  1142.459961  1131.319946  1141.689941  1141.689941  5270680000   0.004001
4  2010-01-08  30.042856  30.285715  29.865715  30.282858    26.349140  111902700      0.006648          1311.0         1261.0      0.006648           0.008900          1310.4         1260.8           NaN           0.029379           1307.2          1257.6            NaN            0.088584      1302.066667     1252.633333            NaN            0.075713  1140.520020  1145.390015  1136.219971  1144.979980  1144.979980  4389590000   0.002882

asset follows:

         Date       Open       High        Low      Close  Adj Close     Volume  return_stock  Pct_positive_1  Pct_beating_1  Pct_change_1  Pct_change_plus_1  Pct_positive_5  Pct_beating_5  Pct_change_5  Pct_change_plus_5
0  2010-01-04  30.490000  30.642857  30.340000  30.572857  26.601469  123432400           NaN          1311.0         1261.0           NaN          -0.001726          1310.4         1260.8           NaN           0.018562
1  2010-01-05  30.657143  30.798571  30.464285  30.625713  26.647457  150476200      0.001729          1311.0         1261.0      0.001729           0.016163          1310.4         1260.8           NaN           0.032062
2  2010-01-06  30.625713  30.747143  30.107143  30.138571  26.223597  138040000     -0.015906          1311.0         1261.0     -0.015906           0.001852          1310.4         1260.8           NaN           0.001519
3  2010-01-07  30.250000  30.285715  29.864286  30.082857  26.175119  119282800     -0.001849          1311.0         1261.0     -0.001849          -0.006604          1310.4         1260.8           NaN           0.005491
4  2010-01-08  30.042856  30.285715  29.865715  30.282858  26.349140  111902700      0.006648          1311.0         1261.0      0.006648           0.008900          1310.4         1260.8           NaN           0.029379

sp_500 follows:

         Date         Open         High          Low        Close    Adj Close      Volume  return_sp
0  1999-12-31  1464.469971  1472.420044  1458.189941  1469.250000  1469.250000   374050000        NaN
1  2000-01-03  1469.250000  1478.000000  1438.359985  1455.219971  1455.219971   931800000  -0.009549
2  2000-01-04  1455.219971  1455.219971  1397.430054  1399.420044  1399.420044  1009000000  -0.038345
3  2000-01-05  1399.420044  1413.270020  1377.680054  1402.109985  1402.109985  1085500000   0.001922
4  2000-01-06  1402.109985  1411.900024  1392.099976  1403.449951  1403.449951  1092300000   0.000956

This is a partial answer.

I think the way you do

asset.Close - asset.Close.shift(1)

at the top is key to how you might do this. Instead of

if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']

create a column with the change in Close_x:

merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)

Similarly,

if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']

becomes

merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)

Then you can iterate i and use subsets like

merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]

There are a lot of additional details to work out, but I hope this gets you started.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM