简体   繁体   中英

unstack date/hour dataframe into single column with datetime index - python, pandas

I have a dataframe like:

                  0       1       2       3       4       5       6       7       8       9       10    11       12      13      14      15      16      17      18      19      20      21      22      23
    16.01.2018  25.45   24.99   24.68   25.00   26.19   28.96   35.78   44.66   41.75   41.58   41.48   41.66   40.66   40.39   40.33   40.73   41.58   45.06   45.84   42.69   39.56   35.4    33.27   29.49
    17.01.2018  28.78   27.71   26.55   25.76   25.97   26.97   30.89   36.06   41.24   40.67   39.86   39.42   38.17   37.31   36.58   36.78   37.8    40.78   40.8    38.95   34.34   31.95   31.56   29.26

where the index is the date a certain value has happened, while the column (from 0 to 23) indicates the hour. I would like to unstack the dataframe in order to have a datetime index and a single column with the respective value:

    16.01.2018 00:00:00  25.45
    16.01.2018 01:00:00  24.99
    16.01.2018 02:00:00  25.68
    16.01.2018 03:00:00  25.00
....

At the moment I am doing:

index = pd.date_range(start = df.index[0], periods=len(df.unstack()), freq='H')
new_df = pd.DataFrame(index=index)
for d in new_df.index.date:
    for h in new_df.index.hour:
        new_df['value'] = df.unstack()[h][d]

but the for loop is taking ages...do you have a better (faster) solution?

Convert index to DatetimeIndex and columns to timedelta s, so after reshape by DataFrame.stack and Series.reset_index only sum both new columns:

df.index = pd.to_datetime(df.index)
df.columns = pd.to_timedelta(df.columns + ':00:00')
df = df.stack().reset_index(name='data')
df.index = df.pop('level_0') + df.pop('level_1')
print (df.head())
                      data
2018-01-16 00:00:00  25.45
2018-01-16 01:00:00  24.99
2018-01-16 02:00:00  24.68
2018-01-16 03:00:00  25.00
2018-01-16 04:00:00  26.19

Soluton with unstack is similar, only output ordering is different:

df.index = pd.to_datetime(df.index)
df.columns = pd.to_timedelta(df.columns + ':00:00')
df = df.unstack().reset_index(name='data')
df.index = df.pop('level_1') + df.pop('level_0')
print (df.head())
                      data
2018-01-16 00:00:00  25.45
2018-01-17 00:00:00  28.78
2018-01-16 01:00:00  24.99
2018-01-17 01:00:00  27.71
2018-01-16 02:00:00  24.68

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM