[英]Pandas resampling irregular time series
I have a time series that looks something like this:我有一个看起来像这样的时间序列:
2018-10-12 00:00:00 1
2018-10-12 01:00:00 0
2018-10-12 02:00:00 0
2018-10-12 06:00:00 7
2018-10-12 07:00:00 22
2018-10-12 08:00:00 8
2018-10-12 09:00:00 18
2018-10-12 10:00:00 24
2018-10-12 11:00:00 8
2018-10-12 11:15:00 5
2018-10-12 11:30:00 4
2018-10-12 11:45:00 25
2018-10-12 12:00:00 29
2018-10-12 12:15:00 19
2018-10-12 12:30:00 24
2018-10-12 12:45:00 16
2018-10-12 13:00:00 49
2018-10-12 14:00:00 36
2018-10-12 15:00:00 27
2018-10-12 16:00:00 20
2018-10-12 17:00:00 8
2018-10-12 17:15:00 7
2018-10-12 17:30:00 8
2018-10-12 17:45:00 9
2018-10-12 18:00:00 10
I would like to resample it, so that it has 15 minute intervals.我想重新采样它,以便它有 15 分钟的间隔。
import pandas as pd
data = pd.read_csv("data.csv", sep=",", index_col=0, parse_dates=True)
data_resampled = data.resample("900s").sum()
That yields this result:这产生了这个结果:
2018-10-12 07:00:00 22
2018-10-12 07:15:00 0
2018-10-12 07:30:00 0
2018-10-12 07:45:00 0
2018-10-12 08:00:00 8
2018-10-12 08:15:00 0
2018-10-12 08:30:00 0
2018-10-12 08:45:00 0
But the result I want is:但我想要的结果是:
2018-10-12 07:00:00 5,5
2018-10-12 07:15:00 5,5
2018-10-12 07:30:00 5,5
2018-10-12 07:45:00 5,5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2
Or ideally something like this或者理想情况下是这样的
2018-10-12 07:00:00 6
2018-10-12 07:15:00 5
2018-10-12 07:30:00 6
2018-10-12 07:45:00 5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2
But I will settle for something like this:但我会接受这样的事情:
2018-10-12 07:00:00 5
2018-10-12 07:15:00 5
2018-10-12 07:30:00 5
2018-10-12 07:45:00 5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2
How do I resample so that an interval that spans multiple of the new intervals is divided equally, or close to equal across the new smaller intervals?如何重新采样,以便跨越多个新间隔的间隔被均分,或者在新的较小间隔中接近相等?
what you can do is in the resample.sum
and use min_count=1
to put the value to NaN if there was no value for this 15min interval before.您可以做的是在
resample.sum
中并使用min_count=1
将值设置为 NaN 如果之前这 15 分钟间隔没有值。 then you can groupby.transform
per group starting where a value exists with notna
and cumsum
(if a value is followed by nan then they are grouped together), and use mean
in the transform with fillna
the nan with 0 before.然后您可以
groupby.transform
每组从notna
和cumsum
存在的值开始(如果一个值后面跟着 nan 则它们被分组在一起),并在转换中使用mean
与fillna
之前为 0 的 nan。
s_ = s.resample('15min').sum(min_count=1)
s_ = s_.fillna(0).groupby(s_.notna().cumsum()).transform('mean')
print (s_)
2018-10-12 00:00:00 0.25 #here it is 1 divided by 4
2018-10-12 00:15:00 0.25
2018-10-12 00:30:00 0.25
2018-10-12 00:45:00 0.25
2018-10-12 01:00:00 0.00
...
2018-10-12 07:00:00 5.50 #same here
2018-10-12 07:15:00 5.50
2018-10-12 07:30:00 5.50
2018-10-12 07:45:00 5.50
2018-10-12 08:00:00 2.00
...
2018-10-12 17:00:00 8.00 # here you keep the original value as existed before
2018-10-12 17:15:00 7.00
2018-10-12 17:30:00 8.00
2018-10-12 17:45:00 9.00
2018-10-12 18:00:00 10.00
Freq: 15T, Name: val, dtype: float64
where s would be a series s=data['name_col_to_resample']
其中 s 将是一个系列
s=data['name_col_to_resample']
I would do resample('H').sum()
, then do a asfreq('15Min')
followed by groupby
:我会做
resample('H').sum()
,然后做一个asfreq('15Min')
,然后是groupby
:
s = df.resample('H').sum().asfreq('15Min').fillna(0)
s.groupby(s.index.floor('H')).transform('mean')
Output (head): Output(头):
1
0
2018-10-12 00:00:00 0.25
2018-10-12 00:15:00 0.25
2018-10-12 00:30:00 0.25
2018-10-12 00:45:00 0.25
2018-10-12 01:00:00 0.00
Try this尝试这个
import pandas as pd
data = pd.read_csv("data.csv", sep=",", index_col=0, parse_dates=True)
# just changing the column names
df.index.name='Datetime'
df.columns = ['values']
# resample
df = df.resample('15min').sum().reset_index() # resample
# This will be used for the groupby
df['key'] = np.cumsum( (df['Datetime'].dt.minute == 0) | (df['values'] > 0) )
df['new_values'] = df.groupby(['key'])['values'].transform('mean')
df = df.drop(columns=['key'])
Notice that when you have the following case请注意,当您遇到以下情况时
2018-10-12 08:00:00 10
2018-10-12 08:15:00 9
2018-10-12 08:30:00 0
2018-10-12 08:45:00 0
it will become它会变成
2018-10-12 08:00:00 10
2018-10-12 08:15:00 3
2018-10-12 08:30:00 3
2018-10-12 08:45:00 3
i dont know if this is what you want.我不知道这是否是你想要的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.