[英]Data grouping into weekyly, monthly and yearly for large datasets using python?
我有以datasets
格式记录 20 年“X”值的数据集。 X记录了平均3小时的数据,数据样本如下。
Time_stamp X
1992-01-01 03:00:00 10.2
1992-01-01 06:00:00 10.4
1992-01-01 09:00:00 11.8
1992-01-01 12:00:00 12.0
1992-01-01 15:00:00 10.4
1992-01-01 18:00:00 9.4
1992-01-01 21:00:00 10.4
1992-01-02 00:00:00 13.6
1992-01-02 03:00:00 13.2
1992-01-02 06:00:00 11.8
1992-01-02 09:00:00 12.0
1992-01-02 12:00:00 12.8
1992-01-02 15:00:00 12.6
1992-01-02 18:00:00 11.0
1992-01-02 21:00:00 12.2
1992-01-03 00:00:00 13.8
1992-01-03 03:00:00 14.0
1992-01-03 06:00:00 13.4
1992-01-03 09:00:00 14.2
1992-01-03 12:00:00 16.2
1992-01-03 15:00:00 13.2
1992-01-03 18:00:00 13.4
1992-01-03 21:00:00 13.8
1992-01-04 00:00:00 14.8
1992-01-04 03:00:00 13.8
1992-01-04 06:00:00 7.6
1992-01-04 09:00:00 5.8
1992-01-04 12:00:00 4.4
1992-01-04 15:00:00 5.6
1992-01-04 18:00:00 6.0
1992-01-04 21:00:00 7.0
1992-01-05 00:00:00 6.8
1992-01-05 03:00:00 3.4
1992-01-05 06:00:00 5.8
1992-01-05 09:00:00 10.6
1992-01-05 12:00:00 9.2
1992-01-05 15:00:00 10.6
1992-01-05 18:00:00 9.8
1992-01-05 21:00:00 11.2
1992-01-06 00:00:00 12.0
1992-01-06 03:00:00 10.2
1992-01-06 06:00:00 9.0
1992-01-06 09:00:00 9.0
1992-01-06 12:00:00 8.6
1992-01-06 15:00:00 8.4
1992-01-06 18:00:00 8.2
1992-01-06 21:00:00 8.8
1992-01-07 00:00:00 10.0
1992-01-07 03:00:00 9.6
1992-01-07 06:00:00 8.0
1992-01-07 09:00:00 9.6
1992-01-07 12:00:00 10.8
1992-01-07 15:00:00 10.2
1992-01-07 18:00:00 9.8
1992-01-07 21:00:00 10.2
1992-01-08 00:00:00 9.4
1992-01-08 03:00:00 11.4
1992-01-08 06:00:00 12.6
1992-01-08 09:00:00 12.8
1992-01-08 12:00:00 10.4
1992-01-08 15:00:00 11.2
1992-01-08 18:00:00 9.0
1992-01-08 21:00:00 10.2
1992-01-09 00:00:00 8.2
我想创建单独的 dataframe 来计算和记录给定数据集的年平均值、周平均值和日平均值。 我是 python 的新手,刚刚开始使用时间序列数据。 我在 stackoverflow 上发现了一些与此相关的问题,但没有找到与此相关的适当答案,并且不知道如何开始。 对此有什么帮助吗? 到目前为止我写了这段代码,
import pandas as pd
import numpy as np
datasets['date_minus_time'] = df["Time_stamp"].apply( lambda df :
datetime.datetime(year=datasets.year, month=datasets.month,
day=datasets.day))
datasets.set_index(df["date_minus_time"],inplace=True)
df['count'].resample('D', how='sum')
df['count'].resample('W', how='sum')
df['count'].resample('M', how='sum')
但不知道如何每 3 小时包含一次该数据记录。 以及我想要的结果接下来应该做什么。
使用to_datetime
作为列中的日期时间以提高性能,然后使用带有参数on
DataFrame.resample
来指定日期时间列:
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])
df_daily = df.resample('D', on='Time_stamp').mean()
df_monthly = df.resample('M', on='Time_stamp').mean()
df_weekly = df.resample('W', on='Time_stamp').mean()
您可以使用:
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'], format='%Y-%m-%d %H:%M:%S')
df.set_index('Time_stamp',inplace=True)
df_monthly = df.resample('M').mean()
df_monthly
输出:
X
Time_stamp
1992-01-31 10.403125
对于每日平均使用: df_daily = df.resample('D').mean()
输出:
X
Time_stamp
1992-01-01 10.657143
1992-01-02 12.400000
1992-01-03 14.000000
1992-01-04 8.125000
1992-01-05 8.425000
1992-01-06 9.275000
1992-01-07 9.775000
1992-01-08 10.875000
1992-01-09 8.200000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.