Say I have some data in a DataFrame df
. In particular, df.columns
is a MultiIndex where the first level indicates "what kind of data" we are dealing with, and the second level indicates some sort of ID. To begin with, there is only a single unique value in the outermost column level:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(400, 5), columns=list('abcde'))
df.columns = pd.MultiIndex.from_tuples([('raw', c) for c in df.columns],
names=['datum', 'id'])
So say I want to compute a 10 period moving average of this chunk of data. I can easily do that with
df['raw'].rolling(window=10, min_periods=10).mean()
I'd like to assign this to a new section of the existing data frame. I wish the syntax were simply:
df['avg_10'] = df['raw'].rolling(window=10, min_periods=10).mean()
But that doesn't work. Instead, to get the equivalent, I need to do something clunky like:
a = df['raw'].rolling(window=10, min_periods=10).mean()
a.columns = pd.MultiIndex.from_tuples([('avg_10', c) for c in a.columns],
names=['datum', 'id'])
df = pd.concat([df, a], axis=1)
Is there a concise way to do this?
you can add new columns in one shot like this:
df[df.columns.get_level_values(1)] = df['raw'].rolling(window=10, min_periods=10).mean()
and now let's bring order to columns levels:
df.columns = pd.MultiIndex.from_tuples(
[t if t[0]=='raw' else ('avg_10', t[0]) for t in df.columns.tolist()]
)
Output:
In [121]: df.tail()
Out[121]:
raw avg_10 \
a b c d e a b
35 -0.036381 -0.202369 0.728408 -1.149906 -0.888169 0.174578 0.244956
36 1.700182 -0.957104 -0.005931 -1.035258 0.916398 0.304429 0.025519
37 1.142203 0.198508 -0.568147 0.006620 1.912575 0.408570 0.029939
38 -1.360093 0.638533 -0.899154 1.120311 1.702436 0.109886 0.155383
39 -1.860319 0.863798 0.876608 1.292301 0.547762 -0.069686 0.141820
c d e
35 -0.046456 -0.291078 0.176360
36 0.128143 -0.670730 0.213351
37 0.041724 -0.542027 0.301774
38 -0.147804 -0.363713 0.400007
39 0.005854 -0.164190 0.483140
Because of df.rolling
as in your example, this solution only works with Pandas 0.18.0+.
# Create sample data with three columns.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(400, 3), columns=list('abc'))
df.columns = pd.MultiIndex.from_tuples([('raw', c) for c in df.columns],
names=['datum', 'id'])
# Have two window periods (e.g. 10, 30).
windows = [10, 30]
cols = df.columns.get_level_values(1)
for window in windows:
for col in cols:
df.loc[:, ('avg_{0}'.format(window), col)] = \
df.xs(col, axis=1, level=1).rolling(window=window, min_periods=window).mean()
>>> df.tail()
datum raw avg_10 avg_30
id a b c a b c a b c
395 -0.177813 0.250998 1.054758 0.528226 0.266558 0.123020 0.046781 0.365069 0.233943
396 0.960048 -0.416499 -0.276823 0.459380 0.379910 0.140920 0.067177 0.329077 0.261536
397 1.123905 -0.173464 -0.510030 0.429155 0.268950 0.022079 0.105671 0.270666 0.271052
398 1.392518 1.037586 0.018792 0.485142 0.340002 -0.139202 0.170970 0.315509 0.262711
399 -0.593777 -2.011880 0.589704 0.387988 0.114828 -0.096127 0.133680 0.206199 0.265718
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.