简体   繁体   English

Python Pandas-周期长度不均匀的移动平均线

[英]Python Pandas - Moving Average with uneven period lengths

I'm trying to figure out how to deal with time series data in pandas that has uneven period lengths. 我试图弄清楚如何处理周期长度不均匀的熊猫中的时间序列数据。 The first example I'm looking at is how to calculate a moving average for the last 15 days. 我要看的第一个示例是如何计算最近15天的移动平均值。 Here is an example of the data (time is UTC) 这是数据示例(时间为UTC)

index   date_time         data
46701   1/06/2016 19:27   15.00
46702   1/06/2016 19:28   18.25
46703   1/06/2016 19:30   16.50
46704   1/06/2016 19:33   17.20
46705   1/06/2016 19:34   18.18

I'm not sure if I should just fill in data so its all even 1 minute increments, or if there is a smarter way... If anyone has suggestions it would be much appreciated 我不确定是否应该只填写数据,以便它甚至以1分钟为增量递增,或者是否有更聪明的方法...如果有人提出建议,将不胜感激

Thanks - KC 谢谢-KC

You can do something like this. 你可以做这样的事情。

  • Resample at the frequency you want (or downsampling) 以您想要的频率重新采样(或下采样)
    • You have to pay attention here to the resampling strategy. 您必须在这里注意重采样策略。 It has to be consistent with the meaning of your data. 它必须与数据的含义一致。 Here I have arbitrary used bfill (back fill that use next valid value) but another strategy could be more appropriate like ffill (forward fill that propagates the last valid value). 在这里,我可以随意使用bfill (使用下一个有效值的bfill ),但是另一种策略可能更合适,例如ffill (传播最后一个有效值的正向填充)。
  • Compute a moving average. 计算移动平均线。
  • Maybe you will have to deal with the index 也许您将不得不处理索引

Note: This syntax for rolling has been introduced in pandas 0.18.0 . 注意:此rolling语法已在pandas 0.18.0中引入。 However it is possible to do the same thing in previous version with pd.rolling_mean . 但是,可以使用pd.rolling_mean在以前的版本中执行相同的pd.rolling_mean

# Test data
d = {'data': [15.0, 18.25, 16.5, 17.199999999999999, 18.18],
 'date_time': ['1/06/2016 19:27',
  '1/06/2016 19:28',
  '1/06/2016 19:30',
  '1/06/2016 19:33',
  '1/06/2016 19:34'],
 'index': [46701, 46702, 46703, 46704, 46705]}

df = DataFrame(d)
df['date_time'] = pd.to_datetime(df['date_time'])

# Setting the date as the index
df.set_index('date_time', inplace=True)
# Resampling data
df = df.resample('1T').bfill()
# Performing moving average
df['moving'] = df['data'].rolling(window=3, center=True).mean()
df.plot(y=['data', 'moving'])
df
                      data  index     moving
date_time                                   
2016-01-06 19:27:00  15.00  46701        NaN
2016-01-06 19:28:00  18.25  46702  16.583333
2016-01-06 19:29:00  16.50  46703  17.083333
2016-01-06 19:30:00  16.50  46703  16.733333
2016-01-06 19:31:00  17.20  46704  16.966667
2016-01-06 19:32:00  17.20  46704  17.200000
2016-01-06 19:33:00  17.20  46704  17.526667
2016-01-06 19:34:00  18.18  46705        NaN

情节

Edit 编辑

Here is an example with missing data. 这是缺少数据的示例。

# Random data parameters
num_sample = (0, 100)
nb_sample = 1000
start_date = '2016-06-02'
freq = '2T'

random_state = np.random.RandomState(0)

# Generating random data
df = pd.DataFrame({'data': random_state.randint(num_sample[0], num_sample[1], nb_sample)},
                          index=random_state.choice(
                              pd.date_range(start=pd.to_datetime(start_date), periods=nb_sample * 3,
                                            freq=freq),
                              nb_sample))
# Removing duplicate index
df = df.groupby(df.index).first()
# Removing data for closed periods
df.loc[(df.index.hour >= 22) | (df.index.hour <= 7),'data'] = np.nan
# Resampling
df = df.resample('1T').ffill()
# Moving average by hours
df['avg'] = df['data'].rolling(window=60).mean()

ax = df.plot(kind='line', subplots=True)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM