I create a Pandas dataframe df
:
df.head()
Out[1]:
A B DateTime
2010-01-01 50.662365 101.035099 2010-01-01
2010-01-02 47.652424 99.274288 2010-01-02
2010-01-03 51.387459 99.747135 2010-01-03
2010-01-04 52.344788 99.621896 2010-01-04
2010-01-05 47.106364 98.286224 2010-01-05
I can add a moving average of column A:
df['A_moving_average'] = df.A.rolling(window=50, axis="rows") \
.apply(lambda x: np.mean(x))
Question: how do I add a moving average of columns A and B?
This should work, but it gives an error:
df['A_B_moving_average'] = df.rolling(window=50, axis="rows") \
.apply(lambda row: (np.mean(row.A) + np.mean(row.B)) / 2)
The error is:
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
Here is how I created the test Pandas dataframe df
:
import numpy.random as rnd
import pandas as pd
import numpy as np
count = 1000
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
{
'DateTime': dates,
'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
}, index=dates
)
I couldn't find a direct solution to the general problem of using multiple columns in rolling
- but in your specific case you can just take the mean of columns A and B and then apply your rolling
:
df['A_B_moving_average'] = ((df.A + df.B) / 2).rolling(window=50, axis='rows').mean()
Just as explanation: If you specify the whole DataFrame for rolling
with axis='rows'
each column is performed seperatly. So:
df['A_B_moving_average'] = df.rolling(window=5, axis='rows').mean()
will first evaluate the rolling window for A
(works) then for B
(works) and then for DateTime
(doesn't work, thus the error). And each rolling window will be a plain NumPy array so you can't access the "column names". Just as demonstration using print
s:
import numpy.random as rnd
import pandas as pd
import numpy as np
count = 10
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
{
'DateTime': dates,
'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
}, index=dates
)
df[['A', 'B']].rolling(window=6, axis='rows').apply(lambda row: print(row) or np.max(row))
prints:
[ 47.32327354 48.12322447 50.86806381 49.3676319 47.81335338
49.66915104]
[ 48.12322447 50.86806381 49.3676319 47.81335338 49.66915104
48.01520798]
[ 50.86806381 49.3676319 47.81335338 49.66915104 48.01520798
48.14089864]
[ 49.3676319 47.81335338 49.66915104 48.01520798 48.14089864
51.89999973]
[ 47.81335338 49.66915104 48.01520798 48.14089864 51.89999973
48.76838054]
[ 100.10662696 96.72411985 103.24600664 95.03841539 95.23430836
102.30955102]
[ 96.72411985 103.24600664 95.03841539 95.23430836 102.30955102
95.18273088]
[ 103.24600664 95.03841539 95.23430836 102.30955102 95.18273088
97.36751546]
[ 95.03841539 95.23430836 102.30955102 95.18273088 97.36751546
99.25325622]
[ 95.23430836 102.30955102 95.18273088 97.36751546 99.25325622
105.16747544]
The first ones are from column A
and the last ones from column B
and all of them are plain arrays.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.