简体   繁体   中英

diff on multi-index dataframe

I have a multi index time series dataset given below:

arr = np.array([12, 12, 12, 72, 72, 72, 26, 26, 26, 22, 22, 22, 46, 46, 46, 32, 32, 32])
df = pd.DataFrame({'date': ['1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000',
                            '1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000'],
                   'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                             'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
                   'lags': ['31/12/1999', '30/12/1999', '29/12/1999',
                              '1/1/2000', '31/12/1999', '30/12/1999',
                              '2/1/2000', '1/1/2000', '31/12/1999',
                              '31/12/1999', '30/12/1999', '29/12/1999',
                              '1/1/2000', '31/12/1999', '30/12/1999',
                              '2/1/2000', '1/1/2000', '31/12/1999']})
df["target"] = arr
df.set_index(['date', 'type', 'lags'], inplace=True)

which is printed as with indexes ['date', 'type', 'lags']:

                          target
date     type  lags            
1/1/2000 A     31/12/1999      12
               30/12/1999      12
               29/12/1999      12
2/1/2000 A     1/1/2000        72
               31/12/1999      72
               30/12/1999      72
3/1/2000 A     2/1/2000        26
               1/1/2000        26
               31/12/1999      26
1/1/2000 B     31/12/1999      22
               30/12/1999      22
               29/12/1999      22
2/1/2000 B     1/1/2000        46
               31/12/1999      46
               30/12/1999      46
3/1/2000 B     2/1/2000        32
               1/1/2000        32
               31/12/1999      32

I am trying to differencing the dataset with given code:

# Differencing the data
for g_name, g_df in df.groupby("type"):
    df.loc[g_df.index, 'target'] = df.loc[g_df.index, 'target'].diff()

I expected to see given data, however, my code printed wrong result:

date      type  origin    
1/1/2000  A      31/12/1999     NaN
                 30/12/1999     NaN
                 29/12/1999     NaN
2/1/2000  A      1/1/2000      60.0
                 31/12/1999    60.0
                 30/12/1999    60.0
3/1/2000  A      2/1/2000     -42.0
                 1/1/2000     -42.0
                 31/12/1999   -42.0
1/1/2000  B      31/12/1999     NaN
                 30/12/1999     NaN
                 29/12/1999     NaN
2/1/2000  B      1/1/2000      24.0
                 31/12/1999    24.0
                 30/12/1999    24.0
3/1/2000  B      2/1/2000     -14.0
                 1/1/2000     -14.0
                 31/12/1999   -14.0

Is there any way to handle multi index cases for diff()?

df['target'] = df.groupby('type')['target'].apply(pd.DataFrame.diff)

Notice that diff has 1 less element than the original list so you will get at Nan value a the start of each group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM