简体   繁体   中英

How to use .rolling() on each row of a Pandas dataframe?

I create a Pandas dataframe df :

df.head()
Out[1]: 
                    A           B   DateTime 
2010-01-01  50.662365  101.035099 2010-01-01             
2010-01-02  47.652424   99.274288 2010-01-02            
2010-01-03  51.387459   99.747135 2010-01-03               
2010-01-04  52.344788   99.621896 2010-01-04               
2010-01-05  47.106364   98.286224 2010-01-05               

I can add a moving average of column A:

df['A_moving_average'] = df.A.rolling(window=50, axis="rows") \
                             .apply(lambda x: np.mean(x))

Question: how do I add a moving average of columns A and B?

This should work, but it gives an error:

df['A_B_moving_average'] = df.rolling(window=50, axis="rows") \
                             .apply(lambda row: (np.mean(row.A) + np.mean(row.B)) / 2)

The error is:

NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented

Appendix A: Code to create Pandas dataframe

Here is how I created the test Pandas dataframe df :

import numpy.random as rnd
import pandas as pd
import numpy as np

count = 1000

dates = pd.date_range('1/1/2010', periods=count, freq='D')

df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)

I couldn't find a direct solution to the general problem of using multiple columns in rolling - but in your specific case you can just take the mean of columns A and B and then apply your rolling :

df['A_B_moving_average'] = ((df.A + df.B) / 2).rolling(window=50, axis='rows').mean()

Just as explanation: If you specify the whole DataFrame for rolling with axis='rows' each column is performed seperatly. So:

df['A_B_moving_average'] = df.rolling(window=5, axis='rows').mean()

will first evaluate the rolling window for A (works) then for B (works) and then for DateTime (doesn't work, thus the error). And each rolling window will be a plain NumPy array so you can't access the "column names". Just as demonstration using print s:

import numpy.random as rnd
import pandas as pd
import numpy as np
count = 10
dates = pd.date_range('1/1/2010', periods=count, freq='D')
df = pd.DataFrame(
    {
        'DateTime': dates,
        'A': rnd.normal(50, 2, count), # Mean 50, standard deviation 2
        'B': rnd.normal(100, 4, count) # Mean 100, standard deviation 4
    }, index=dates
)
df[['A', 'B']].rolling(window=6, axis='rows').apply(lambda row: print(row) or np.max(row))

prints:

[ 47.32327354  48.12322447  50.86806381  49.3676319   47.81335338
  49.66915104]
[ 48.12322447  50.86806381  49.3676319   47.81335338  49.66915104
  48.01520798]
[ 50.86806381  49.3676319   47.81335338  49.66915104  48.01520798
  48.14089864]
[ 49.3676319   47.81335338  49.66915104  48.01520798  48.14089864
  51.89999973]
[ 47.81335338  49.66915104  48.01520798  48.14089864  51.89999973
  48.76838054]
[ 100.10662696   96.72411985  103.24600664   95.03841539   95.23430836
  102.30955102]
[  96.72411985  103.24600664   95.03841539   95.23430836  102.30955102
   95.18273088]
[ 103.24600664   95.03841539   95.23430836  102.30955102   95.18273088
   97.36751546]
[  95.03841539   95.23430836  102.30955102   95.18273088   97.36751546
   99.25325622]
[  95.23430836  102.30955102   95.18273088   97.36751546   99.25325622
  105.16747544]

The first ones are from column A and the last ones from column B and all of them are plain arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM