[英]Resample a time-series data at the end of the month and at the end of the day
I have a timeseries data with the following format.我有以下格式的时间序列数据。
DateShort (%d/%m/%Y)![]() |
TimeFrom![]() |
TimeTo![]() |
Value![]() |
---|---|---|---|
1/1/2018 ![]() |
0:00 ![]() |
1:00 ![]() |
6414 ![]() |
1/1/2018 ![]() |
1:00 ![]() |
2:00 ![]() |
6153 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
1/1/2018 ![]() |
23:00 ![]() |
0:00 ![]() |
6317 ![]() |
2/1/2018 ![]() |
0:00 ![]() |
1:00 ![]() |
6046 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
I would like to re-sample data at the end of the month and at the end of the day.我想在月底和一天结束时重新采样数据。
The dataset could be retrieved from https://pastebin.com/raw/NWdigN97可以从https://pastebin.com/raw/NWdigN97检索数据集
pandas.DataFrame.resample()
provides 'M'
rule to retrieve data from the end of the month but at the beginning of the day. pandas.DataFrame.resample()
提供'M'
规则来检索月末但一天开始的数据。
See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html见https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Do you have better solution to accomplish this?你有更好的解决方案来完成这个吗?
I have the following sample code:我有以下示例代码:
import numpy as np
import pandas as pd
ds_url = 'https://pastebin.com/raw/NWdigN97'
df = pd.read_csv(ds_url, header=0)
df['DateTime'] = pd.to_datetime(
df['DateShort'] + ' ' + df['TimeFrom'],
format='%d/%m/%Y %H:%M'
)
df.drop('DateShort', axis=1, inplace=True)
df.set_index('DateTime', inplace=True)
df.resample('M').asfreq()
The output is output 是
TimeFrom TimeTo Value
DateTime
2018-01-31 0:00 1:00 7215
2018-02-28 0:00 1:00 8580
2018-03-31 0:00 1:00 6202
2018-04-30 0:00 1:00 5369
2018-05-31 0:00 1:00 5840
2018-06-30 0:00 1:00 5730
2018-07-31 0:00 1:00 5979
2018-08-31 0:00 1:00 6009
2018-09-30 0:00 1:00 5430
2018-10-31 0:00 1:00 6587
2018-11-30 0:00 1:00 7948
2018-12-31 0:00 1:00 6193
However, the correct output should be但是,正确的 output 应该是
TimeFrom TimeTo Value
DateTime
2018-01-31 23:00 0:00 7605
2018-02-28 23:00 0:00 8790
2018-03-31 23:00 0:00 5967
2018-04-30 23:00 0:00 5595
2018-05-31 23:00 0:00 5558
2018-06-30 23:00 0:00 5153
2018-07-31 23:00 0:00 5996
2018-08-31 23:00 0:00 5757
2018-09-30 23:00 0:00 5785
2018-10-31 23:00 0:00 6437
2018-11-30 23:00 0:00 7830
2018-12-31 23:00 0:00 6767
Try this:尝试这个:
df.groupby(pd.Grouper(freq='M')).last()
Output: Output:
TimeFrom TimeTo Value
DateTime
2018-01-31 23:00 0:00 7605
2018-02-28 23:00 0:00 8790
2018-03-31 23:00 0:00 5967
2018-04-30 23:00 0:00 5595
2018-05-31 23:00 0:00 5558
2018-06-30 23:00 0:00 5153
2018-07-31 23:00 0:00 5996
2018-08-31 23:00 0:00 5757
2018-09-30 23:00 0:00 5785
2018-10-31 23:00 0:00 6437
2018-11-30 23:00 0:00 7830
2018-12-31 23:00 0:00 6707
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.