简体   繁体   English

如何重新排列python pandas数据帧?

[英]How to rearrange a python pandas dataframe?

I have the following dataframe read in from a .csv file with the "Date" column being the index. 我从.csv文件读入以下数据帧,其中“Date”列是索引。 The days are in the rows and the columns show the values for the hours that day. 日期在行中,列显示当天的小时值。

> Date           h1 h2  h3  h4 ... h24
> 14.03.2013    60  50  52  49 ... 73

I would like to arrange it like this, so that there is one index column with the date/time and one column with the values in a sequence 我想像这样安排它,这样就有一个索引列带有日期/时间,一列带有序列中的值

>Date/Time            Value
>14.03.2013 00:00:00  60
>14.03.2013 01:00:00  50
>14.03.2013 02:00:00  52
>14.03.2013 03:00:00  49
>.
>.
>.
>14.03.2013 23:00:00  73

I was trying it by using two loops to go through the dataframe. 我通过使用两个循环来遍历数据帧来尝试它。 Is there an easier way to do this in pandas? 在熊猫中有更简单的方法吗?

I'm not the best at date manipulations, but maybe something like this: 我不是最好的约会操纵,但可能是这样的:

import pandas as pd
from datetime import timedelta

df = pd.read_csv("hourmelt.csv", sep=r"\s+")

df = pd.melt(df, id_vars=["Date"])
df = df.rename(columns={'variable': 'hour'})
df['hour'] = df['hour'].apply(lambda x: int(x.lstrip('h'))-1)

combined = df.apply(lambda x: 
                    pd.to_datetime(x['Date'], dayfirst=True) + 
                    timedelta(hours=int(x['hour'])), axis=1)

df['Date'] = combined
del df['hour']

df = df.sort("Date")

Some explanation follows. 一些解释如下。

Starting from 从...开始

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> 
>>> df = pd.read_csv("hourmelt.csv", sep=r"\s+")
>>> df
         Date  h1  h2  h3  h4  h24
0  14.03.2013  60  50  52  49   73
1  14.04.2013   5   6   7   8    9

We can use pd.melt to make the hour columns into one column with that value: 我们可以使用pd.melt将小时列放到一个具有该值的列中:

>>> df = pd.melt(df, id_vars=["Date"])
>>> df = df.rename(columns={'variable': 'hour'})
>>> df
         Date hour  value
0  14.03.2013   h1     60
1  14.04.2013   h1      5
2  14.03.2013   h2     50
3  14.04.2013   h2      6
4  14.03.2013   h3     52
5  14.04.2013   h3      7
6  14.03.2013   h4     49
7  14.04.2013   h4      8
8  14.03.2013  h24     73
9  14.04.2013  h24      9

Get rid of those h s: 摆脱那些h S:

>>> df['hour'] = df['hour'].apply(lambda x: int(x.lstrip('h'))-1)
>>> df
         Date  hour  value
0  14.03.2013     0     60
1  14.04.2013     0      5
2  14.03.2013     1     50
3  14.04.2013     1      6
4  14.03.2013     2     52
5  14.04.2013     2      7
6  14.03.2013     3     49
7  14.04.2013     3      8
8  14.03.2013    23     73
9  14.04.2013    23      9

Combine the two columns as a date: 将这两列合并为一个日期:

>>> combined = df.apply(lambda x: pd.to_datetime(x['Date'], dayfirst=True) + timedelta(hours=int(x['hour'])), axis=1)
>>> combined
0    2013-03-14 00:00:00
1    2013-04-14 00:00:00
2    2013-03-14 01:00:00
3    2013-04-14 01:00:00
4    2013-03-14 02:00:00
5    2013-04-14 02:00:00
6    2013-03-14 03:00:00
7    2013-04-14 03:00:00
8    2013-03-14 23:00:00
9    2013-04-14 23:00:00

Reassemble and clean up: 重新组装和清理:

>>> df['Date'] = combined
>>> del df['hour']
>>> df = df.sort("Date")
>>> df
                 Date  value
0 2013-03-14 00:00:00     60
2 2013-03-14 01:00:00     50
4 2013-03-14 02:00:00     52
6 2013-03-14 03:00:00     49
8 2013-03-14 23:00:00     73
1 2013-04-14 00:00:00      5
3 2013-04-14 01:00:00      6
5 2013-04-14 02:00:00      7
7 2013-04-14 03:00:00      8
9 2013-04-14 23:00:00      9

You could always grab the hourly data_array and flatten it. 你总是可以抓住每小时的data_array并将其展平。 You would generate a new DatetimeIndex with hourly freq. 您将生成一个带有每小时频率的新DatetimeIndex。

df = df.asfreq('D')
hourly_data = df.values[:, :]
new_ind = pd.date_range(start=df.index[0], freq="H", periods=len(df) * 24)
# create Series.
s = pd.Series(hourly_data.flatten(), index=new_ind)

I'm assuming that read_csv is parsing the 'Date' column and making it the index. 我假设read_csv正在解析'Date'列并使其成为索引。 We change to frequency of 'D' so that the new_ind lines up correctly if you have missing days. 我们更改为“D”的频率,以便在您缺少天数时正确new_ind The missing days will be filled with np.nan which you can drop with s.dropna() . 缺少的日子将用np.nan填充,您可以使用s.dropna()删除它。

notebook link 笔记本链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM