简体   繁体   English

Pandas 重采样不规则时间序列

[英]Pandas resampling irregular time series

I have a time series that looks something like this:我有一个看起来像这样的时间序列:

2018-10-12 00:00:00 1
2018-10-12 01:00:00 0
2018-10-12 02:00:00 0
2018-10-12 06:00:00 7
2018-10-12 07:00:00 22
2018-10-12 08:00:00 8
2018-10-12 09:00:00 18
2018-10-12 10:00:00 24
2018-10-12 11:00:00 8
2018-10-12 11:15:00 5
2018-10-12 11:30:00 4
2018-10-12 11:45:00 25
2018-10-12 12:00:00 29
2018-10-12 12:15:00 19
2018-10-12 12:30:00 24
2018-10-12 12:45:00 16
2018-10-12 13:00:00 49
2018-10-12 14:00:00 36
2018-10-12 15:00:00 27
2018-10-12 16:00:00 20
2018-10-12 17:00:00 8
2018-10-12 17:15:00 7
2018-10-12 17:30:00 8
2018-10-12 17:45:00 9
2018-10-12 18:00:00 10

I would like to resample it, so that it has 15 minute intervals.我想重新采样它,以便它有 15 分钟的间隔。

import pandas as pd

data = pd.read_csv("data.csv", sep=",", index_col=0, parse_dates=True)

data_resampled = data.resample("900s").sum()

That yields this result:这产生了这个结果:

2018-10-12 07:00:00 22
2018-10-12 07:15:00 0
2018-10-12 07:30:00 0
2018-10-12 07:45:00 0
2018-10-12 08:00:00 8
2018-10-12 08:15:00 0
2018-10-12 08:30:00 0
2018-10-12 08:45:00 0

But the result I want is:但我想要的结果是:

2018-10-12 07:00:00 5,5
2018-10-12 07:15:00 5,5
2018-10-12 07:30:00 5,5
2018-10-12 07:45:00 5,5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2

Or ideally something like this或者理想情况下是这样的

2018-10-12 07:00:00 6
2018-10-12 07:15:00 5
2018-10-12 07:30:00 6
2018-10-12 07:45:00 5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2

But I will settle for something like this:但我会接受这样的事情:

2018-10-12 07:00:00 5
2018-10-12 07:15:00 5
2018-10-12 07:30:00 5
2018-10-12 07:45:00 5
2018-10-12 08:00:00 2
2018-10-12 08:15:00 2
2018-10-12 08:30:00 2
2018-10-12 08:45:00 2

How do I resample so that an interval that spans multiple of the new intervals is divided equally, or close to equal across the new smaller intervals?如何重新采样,以便跨越多个新间隔的间隔被均分,或者在新的较小间隔中接近相等?

what you can do is in the resample.sum and use min_count=1 to put the value to NaN if there was no value for this 15min interval before.您可以做的是在resample.sum中并使用min_count=1将值设置为 NaN 如果之前这 15 分钟间隔没有值。 then you can groupby.transform per group starting where a value exists with notna and cumsum (if a value is followed by nan then they are grouped together), and use mean in the transform with fillna the nan with 0 before.然后您可以groupby.transform每组从notnacumsum存在的值开始(如果一个值后面跟着 nan 则它们被分组在一起),并在转换中使用meanfillna之前为 0 的 nan。

s_ = s.resample('15min').sum(min_count=1)
s_ = s_.fillna(0).groupby(s_.notna().cumsum()).transform('mean')

print (s_)
2018-10-12 00:00:00     0.25 #here it is 1 divided by 4
2018-10-12 00:15:00     0.25
2018-10-12 00:30:00     0.25
2018-10-12 00:45:00     0.25
2018-10-12 01:00:00     0.00
...
2018-10-12 07:00:00     5.50 #same here
2018-10-12 07:15:00     5.50
2018-10-12 07:30:00     5.50
2018-10-12 07:45:00     5.50
2018-10-12 08:00:00     2.00
...
2018-10-12 17:00:00     8.00 # here you keep the original value as existed before
2018-10-12 17:15:00     7.00 
2018-10-12 17:30:00     8.00
2018-10-12 17:45:00     9.00
2018-10-12 18:00:00    10.00
Freq: 15T, Name: val, dtype: float64

where s would be a series s=data['name_col_to_resample']其中 s 将是一个系列s=data['name_col_to_resample']

I would do resample('H').sum() , then do a asfreq('15Min') followed by groupby :我会做resample('H').sum() ,然后做一个asfreq('15Min') ,然后是groupby

s = df.resample('H').sum().asfreq('15Min').fillna(0)
s.groupby(s.index.floor('H')).transform('mean')

Output (head): Output(头):

                        1
0                        
2018-10-12 00:00:00  0.25
2018-10-12 00:15:00  0.25
2018-10-12 00:30:00  0.25
2018-10-12 00:45:00  0.25
2018-10-12 01:00:00  0.00

Try this尝试这个

import pandas as pd

data = pd.read_csv("data.csv", sep=",", index_col=0, parse_dates=True)
# just changing the column names
df.index.name='Datetime' 
df.columns = ['values']

# resample
df = df.resample('15min').sum().reset_index() # resample

# This will be used for the groupby
df['key'] = np.cumsum( (df['Datetime'].dt.minute == 0) | (df['values'] > 0) )

df['new_values'] = df.groupby(['key'])['values'].transform('mean')

df = df.drop(columns=['key'])

Notice that when you have the following case请注意,当您遇到以下情况时

2018-10-12 08:00:00 10
2018-10-12 08:15:00 9
2018-10-12 08:30:00 0
2018-10-12 08:45:00 0

it will become它会变成

2018-10-12 08:00:00 10
2018-10-12 08:15:00 3
2018-10-12 08:30:00 3
2018-10-12 08:45:00 3

i dont know if this is what you want.我不知道这是否是你想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM