I want to roll over to calculate the rank of a series.
Assume I have a pandas series:
In [18]: s = pd.Series(np.random.rand(10))
In [19]: s
Out[19]:
0 0.340396
1 0.664459
2 0.647212
3 0.529363
4 0.535349
5 0.781628
6 0.313549
7 0.933539
8 0.618337
9 0.013442
dtype: float64
I can use pandas inner function rank like this:
In [20]: s.rolling(4).apply(lambda x: pd.Series(x).rank().iloc[-1])
<ipython-input-20-41df4deb36f8>:1: FutureWarning: Currently, 'apply' passes the values as ndarrays to the applied function. In the future, this will change to passing it as Series objects. You need to specify 'raw=True' to keep the current behaviour, and you can pass 'raw=False' to silence this warning
s.rolling(4).apply(lambda x: pd.Series(x).rank().iloc[-1])
Out[20]:
0 NaN
1 NaN
2 NaN
3 2.0
4 2.0
5 4.0
6 1.0
7 4.0
8 2.0
9 1.0
dtype: float64
This is ok, but it's quite slow, here is a test.
In [24]: %timeit pd.Series(np.random.rand(100000)).rolling(100).apply(lambda x: pd.Series(x).rank().iloc[-1])
<magic-timeit>:1: FutureWarning: Currently, 'apply' passes the values as ndarrays to the applied function. In the future, this will change to passing it as Series objects. You need to specify 'raw=True' to keep the current behaviour, and you can pass 'raw=False' to silence this warning
22.5 s ± 292 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Is there any good method i can use to speed up, i think the rolling loop have something can do to improve. thanks
It's faster with scipy/numpy (requires the latest version of numpy ):
import pandas as pd
import numpy as np
from time import time
from scipy.stats import rankdata
from numpy.lib.stride_tricks import sliding_window_view
np.random.seed()
array = np.random.rand(100000)
t0 = time()
ranks = pd.Series(array).rolling(100).apply(lambda x: x.rank().iloc[-1])
t1 = time()
print(f'With pandas: {t1-t0} sec.')
t0 = time()
ranks = [rankdata(x)[-1] for x in sliding_window_view(array, window_shape=100)]
t1 = time()
print(f'With numpy: {t1-t0} sec.')
Output:
With pandas: 11.682222127914429 sec.
With numpy: 3.9317219257354736 sec.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.