简体   繁体   中英

Assign to *new* subset of a pandas DataFrame

Say I have some data in a DataFrame df . In particular, df.columns is a MultiIndex where the first level indicates "what kind of data" we are dealing with, and the second level indicates some sort of ID. To begin with, there is only a single unique value in the outermost column level:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(400, 5), columns=list('abcde'))
df.columns = pd.MultiIndex.from_tuples([('raw', c) for c in df.columns], 
                                       names=['datum', 'id'])

So say I want to compute a 10 period moving average of this chunk of data. I can easily do that with

df['raw'].rolling(window=10, min_periods=10).mean()

I'd like to assign this to a new section of the existing data frame. I wish the syntax were simply:

df['avg_10'] = df['raw'].rolling(window=10, min_periods=10).mean()

But that doesn't work. Instead, to get the equivalent, I need to do something clunky like:

a = df['raw'].rolling(window=10, min_periods=10).mean()
a.columns = pd.MultiIndex.from_tuples([('avg_10', c) for c in a.columns],
                                      names=['datum', 'id'])
df = pd.concat([df, a], axis=1)

Is there a concise way to do this?

you can add new columns in one shot like this:

df[df.columns.get_level_values(1)] = df['raw'].rolling(window=10, min_periods=10).mean()

and now let's bring order to columns levels:

df.columns = pd.MultiIndex.from_tuples(
  [t if t[0]=='raw' else ('avg_10', t[0]) for t in df.columns.tolist()]
)

Output:

In [121]: df.tail()
Out[121]:
         raw                                            avg_10            \
           a         b         c         d         e         a         b
35 -0.036381 -0.202369  0.728408 -1.149906 -0.888169  0.174578  0.244956
36  1.700182 -0.957104 -0.005931 -1.035258  0.916398  0.304429  0.025519
37  1.142203  0.198508 -0.568147  0.006620  1.912575  0.408570  0.029939
38 -1.360093  0.638533 -0.899154  1.120311  1.702436  0.109886  0.155383
39 -1.860319  0.863798  0.876608  1.292301  0.547762 -0.069686  0.141820


           c         d         e
35 -0.046456 -0.291078  0.176360
36  0.128143 -0.670730  0.213351
37  0.041724 -0.542027  0.301774
38 -0.147804 -0.363713  0.400007
39  0.005854 -0.164190  0.483140

Because of df.rolling as in your example, this solution only works with Pandas 0.18.0+.

# Create sample data with three columns.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(400, 3), columns=list('abc'))
df.columns = pd.MultiIndex.from_tuples([('raw', c) for c in df.columns], 
                                       names=['datum', 'id'])

# Have two window periods (e.g. 10, 30).
windows = [10, 30]
cols = df.columns.get_level_values(1)
for window in windows:
    for col in cols:
        df.loc[:, ('avg_{0}'.format(window), col)] = \
            df.xs(col, axis=1, level=1).rolling(window=window, min_periods=window).mean()

>>> df.tail()

datum       raw                        avg_10                        avg_30                    
id            a         b         c         a         b         c         a         b         c
395   -0.177813  0.250998  1.054758  0.528226  0.266558  0.123020  0.046781  0.365069  0.233943
396    0.960048 -0.416499 -0.276823  0.459380  0.379910  0.140920  0.067177  0.329077  0.261536
397    1.123905 -0.173464 -0.510030  0.429155  0.268950  0.022079  0.105671  0.270666  0.271052
398    1.392518  1.037586  0.018792  0.485142  0.340002 -0.139202  0.170970  0.315509  0.262711
399   -0.593777 -2.011880  0.589704  0.387988  0.114828 -0.096127  0.133680  0.206199  0.265718

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM