简体   繁体   English

熊猫datetime multiindex更改为日期索引和时间列(带有重新索引)

[英]Pandas datetime multiindex changed to date index and time columns (with reindex)

Setup: I have a multiindex dataframe data like this; 设置:我有一个像这样的多索引数据框数据

                                                     value
date                      date                               
2015-08-13 00:00:00+10:00 2015-08-13 06:30:00+10:00  0.812689
                          2015-08-13 15:30:00+10:00  0.054290
                          2015-08-13 16:00:00+10:00  0.206277
                          2015-08-13 16:30:00+10:00  0.082520
                          2015-08-13 17:00:00+10:00  0.009448
                          2015-08-13 17:30:00+10:00  0.000000
2015-08-14 00:00:00+10:00 2015-08-14 06:30:00+10:00  0.000000
                          2015-08-14 07:00:00+10:00  0.000280
                          2015-08-14 07:30:00+10:00  0.034119
                          2015-08-14 08:00:00+10:00  0.168524
                          2015-08-14 08:30:00+10:00  0.471783
                          2015-08-14 09:00:00+10:00  0.522409

As an interim step I make the first index level to just be dates and the second index level to just be times, which I have done with, 作为过渡步骤,我将第一个索引级别设置为日期,将第二个索引级别设置为时间,

# set index level 0 to dates
day_start=[i.date() for i in data.index.levels[0]]
data.index.set_levels(day_start, level=0, inplace=True)

# set index level 1 to times
interval_start=[i.time() for i in data.index.levels[1]]
data_interval.index.set_levels(interval_start, level=1, inplace=True)

# rename time index
data.index.set_names('time', level=1, inplace=True)

Maybe not the best way to do it but it gives, 也许不是最好的方法,但是它可以,

                        value
date       time              
2015-08-13 06:30:00  0.812689
           15:30:00  0.054290
           16:00:00  0.206277
           16:30:00  0.082520
           17:00:00  0.009448
           17:30:00  0.000000
2015-08-14 06:30:00  0.000000
           07:00:00  0.000280
           07:30:00  0.034119
           08:00:00  0.168524
           08:30:00  0.471783
           09:00:00  0.522409

Problem: What I haven't been able to do next is reindex the time so there's an index every 30 minutes from 00:00 to 23:30, with zeros filled in for missing data. 问题:我接下来无法做的就是重新索引时间,因此从00:00到23:30每30分钟就有一个索引,其中零填充了丢失的数据。 This would make it consistent for every day, which may have different start/end times with data. 这将使其与每天的数据保持一致,这可能与数据的开始/结束时间不同。 ie

                     value
date       time              
2015-08-13 00:00:00  0.0
           00:30:00  0.0
              :
           06:30:00  0.812689
           07:00:00  0.0
           07:30:00  0.0
              :
           15:30:00  0.054290
           16:00:00  0.206277
           16:30:00  0.082520
              :
           23:30:00  0.0

And so on for each day. 以此类推。 Trying to reindex on level=1 seems to have no effect when passing in an array of 30 minute spaced times. 在间隔30分钟的时间数组中传递时,尝试在level = 1上重新索引似乎没有任何效果。 Not sure this is even the right approach. 不确定这是否是正确的方法。

Next step: What I'd like to do after that is data.unstack(level=1) so all the time indices become column headers. 下一步:我想什么做的是data.unstack(等级= 1),因此所有的时间指数成为列标题。 If I unstack it as is I get a weird mash up of columns with repeating times (which is mainly why I'm trying to make them consistent between days in the first place). 如果按原样拆箱,我会得到重复次数很奇怪的列混叠(这主要是为什么我想首先使它们在几天之间保持一致的原因)。 Something like; 就像是;

            value                                                          
time        06:30:00 15:30:00  16:00:00 16:30:00  17:00:00 17:30:00 06:30:00   
date                                                                           
2015-08-13  0.812689  0.05429  0.206277  0.08252  0.009448      0.0      0.0  
2015-08-14  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0   
2015-08-15  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0
2015-08-16  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0   
2015-08-17  0.000000  0.00000  0.000000  0.00000  0.000000      0.0      0.0

There's lots of missing data on those days so it didn't go into the correct columns I'm guessing. 那些日子有很多丢失的数据,所以它没有进入我猜测的正确列。 I'm probably fundamentally missing something in the reindexing and maybe my whole approach is not the way to get the end result. 我可能从根本上丢失了重新索引编制中的某些内容,也许我的整个方法并不是获得最终结果的方法。

First, just discard the "date" column. 首先,只需丢弃“日期”列。 It is redundant and hurts more than it helps. 它是多余的,伤害大于帮助。 That's df.index = df.index.droplevel(0) . 那是df.index = df.index.droplevel(0)

Now you have this: 现在您有了:

                        value
time                         
2015-08-13 06:30:00  0.812689
2015-08-13 15:30:00  0.054290
2015-08-13 16:00:00  0.206277
2015-08-13 16:30:00  0.082520
2015-08-13 17:00:00  0.009448
2015-08-13 17:30:00  0.000000
2015-08-14 06:30:00  0.000000
2015-08-14 07:00:00  0.000280
2015-08-14 07:30:00  0.034119
2015-08-14 08:00:00  0.168524
2015-08-14 08:30:00  0.471783
2015-08-14 09:00:00  0.522409

Then, df.resample('30min').first().fillna(0) : 然后, df.resample('30min').first().fillna(0)

                        value
time                         
2015-08-13 06:30:00  0.812689
2015-08-13 07:00:00  0.000000
2015-08-13 07:30:00  0.000000
2015-08-13 08:00:00  0.000000
...

Now split the index into separate date and time parts: 现在将索引分为单独的日期和时间部分:

df['date'] = df.index.date
df['time'] = df.index.time

And finally, pivot: 最后,枢纽:

df.pivot(values='value', index='date', columns='time')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM