简体   繁体   中英

Pandas datetime multiindex changed to date index and time columns (with reindex)

Setup: I have a multiindex dataframe data like this;

                                                     value
date                      date                               
2015-08-13 00:00:00+10:00 2015-08-13 06:30:00+10:00  0.812689
                          2015-08-13 15:30:00+10:00  0.054290
                          2015-08-13 16:00:00+10:00  0.206277
                          2015-08-13 16:30:00+10:00  0.082520
                          2015-08-13 17:00:00+10:00  0.009448
                          2015-08-13 17:30:00+10:00  0.000000
2015-08-14 00:00:00+10:00 2015-08-14 06:30:00+10:00  0.000000
                          2015-08-14 07:00:00+10:00  0.000280
                          2015-08-14 07:30:00+10:00  0.034119
                          2015-08-14 08:00:00+10:00  0.168524
                          2015-08-14 08:30:00+10:00  0.471783
                          2015-08-14 09:00:00+10:00  0.522409

As an interim step I make the first index level to just be dates and the second index level to just be times, which I have done with,

# set index level 0 to dates
day_start=[i.date() for i in data.index.levels[0]]
data.index.set_levels(day_start, level=0, inplace=True)

# set index level 1 to times
interval_start=[i.time() for i in data.index.levels[1]]
data_interval.index.set_levels(interval_start, level=1, inplace=True)

# rename time index
data.index.set_names('time', level=1, inplace=True)

Maybe not the best way to do it but it gives,

                        value
date       time              
2015-08-13 06:30:00  0.812689
           15:30:00  0.054290
           16:00:00  0.206277
           16:30:00  0.082520
           17:00:00  0.009448
           17:30:00  0.000000
2015-08-14 06:30:00  0.000000
           07:00:00  0.000280
           07:30:00  0.034119
           08:00:00  0.168524
           08:30:00  0.471783
           09:00:00  0.522409

Problem: What I haven't been able to do next is reindex the time so there's an index every 30 minutes from 00:00 to 23:30, with zeros filled in for missing data. This would make it consistent for every day, which may have different start/end times with data. ie

                     value
date       time              
2015-08-13 00:00:00  0.0
           00:30:00  0.0
              :
           06:30:00  0.812689
           07:00:00  0.0
           07:30:00  0.0
              :
           15:30:00  0.054290
           16:00:00  0.206277
           16:30:00  0.082520
              :
           23:30:00  0.0

And so on for each day. Trying to reindex on level=1 seems to have no effect when passing in an array of 30 minute spaced times. Not sure this is even the right approach.

Next step: What I'd like to do after that is data.unstack(level=1) so all the time indices become column headers. If I unstack it as is I get a weird mash up of columns with repeating times (which is mainly why I'm trying to make them consistent between days in the first place). Something like;

            value                                                          
time        06:30:00 15:30:00  16:00:00 16:30:00  17:00:00 17:30:00 06:30:00   
date                                                                           
2015-08-13  0.812689  0.05429  0.206277  0.08252  0.009448      0.0      0.0  
2015-08-14  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0   
2015-08-15  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0
2015-08-16  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0   
2015-08-17  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0

There's lots of missing data on those days so it didn't go into the correct columns I'm guessing. I'm probably fundamentally missing something in the reindexing and maybe my whole approach is not the way to get the end result.

First, just discard the "date" column. It is redundant and hurts more than it helps. That's df.index = df.index.droplevel(0) .

Now you have this:

                        value
time                         
2015-08-13 06:30:00  0.812689
2015-08-13 15:30:00  0.054290
2015-08-13 16:00:00  0.206277
2015-08-13 16:30:00  0.082520
2015-08-13 17:00:00  0.009448
2015-08-13 17:30:00  0.000000
2015-08-14 06:30:00  0.000000
2015-08-14 07:00:00  0.000280
2015-08-14 07:30:00  0.034119
2015-08-14 08:00:00  0.168524
2015-08-14 08:30:00  0.471783
2015-08-14 09:00:00  0.522409

Then, df.resample('30min').first().fillna(0) :

                        value
time                         
2015-08-13 06:30:00  0.812689
2015-08-13 07:00:00  0.000000
2015-08-13 07:30:00  0.000000
2015-08-13 08:00:00  0.000000
...

Now split the index into separate date and time parts:

df['date'] = df.index.date
df['time'] = df.index.time

And finally, pivot:

df.pivot(values='value', index='date', columns='time')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM