简体   繁体   中英

Speed up finding the average of top 5 numbers from a rolling window in python

I would like to create a column of the average of the 5 highest values from a rolling window of 30. Using a for loop is very slow with a large DataFrame. I tried using rolling() with nlargest(), but it wouldn't work. Any suggestions for speeding this up?

def top_values(df, column, days):
    top5 = df.nlargest(days, column)
    top = top5[column].sum() / days

x = 0
w = 0
for i in df.index:
    if x > 30:
        df['tops'][x] = top_values(df[w:x], 'column', 5)
        w += 1
        x += 1

One method would be to use a lambda function in your rolling , and for instance get the mean of the first 5 elements of a sorted list:

df['column'].rolling(30).apply(lambda x: np.mean(sorted(x,reverse=True)[:5]))

Minimal Example :

On a dataframe of 15 elements, we can get the mean value of the top 3 values in a window of 5 to demonstrate:

>>> df
    column
0       48
1        9
2       36
3       71
4       59
5       16
6        9
7       18
8       43
9        3
10      54
11      23
12      12
13      38
14      54

>>> df['column'].rolling(5).apply(lambda x: np.mean(sorted(x,reverse=True)[:3]))
0           NaN
1           NaN
2           NaN
3           NaN
4     59.333333
5     55.333333
6     55.333333
7     49.333333
8     40.000000
9     25.666667
10    38.333333
11    40.000000
12    40.000000
13    38.333333
14    48.666667
Name: column, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM