简体   繁体   English

如何在具有强模式的每日数据中查看趋势和残差模式

[英]how to see trend and residual patterns in daily data with strong patterns

Im trying to remove patterns from a dataset which have a daily activity shape like the one below.我试图从数据集中删除具有如下日常活动形状的模式。 I tried seasonal_decompose on it which may not be appropriate.我尝试了seasonal_decompose,这可能不合适。

What I would like to do is remove the expected peak usage pattern and arrive at a trend or peak as happens when you apply the seasonal_decompose function in monthly data.我想要做的是删除预期的峰值使用模式,并达到在每月数据中应用季节性分解函数时发生的趋势或峰值。

Does anyone know can I see trends and abnormal data in daily data like this ?有谁知道我可以在这样的日常数据中看到趋势和异常数据吗?

在此处输入图片说明

Edit: Here is the code to reproduce above example.编辑:这是重现上述示例的代码。

sample = {'EventTime': [pd.Timestamp('2020-09-21 00:00:00'), pd.Timestamp('2020-09-21 01:00:00'), pd.Timestamp('2020-09-21 02:00:00'), pd.Timestamp('2020-09-21 03:00:00'), pd.Timestamp('2020-09-21 04:00:00'), pd.Timestamp('2020-09-21 05:00:00'), pd.Timestamp('2020-09-21 06:00:00'), pd.Timestamp('2020-09-21 07:00:00'), pd.Timestamp('2020-09-21 08:00:00'), pd.Timestamp('2020-09-21 09:00:00'), pd.Timestamp('2020-09-21 10:00:00'), pd.Timestamp('2020-09-21 11:00:00'), pd.Timestamp('2020-09-21 12:00:00'), pd.Timestamp('2020-09-21 13:00:00'), pd.Timestamp('2020-09-21 14:00:00'), pd.Timestamp('2020-09-21 15:00:00'), pd.Timestamp('2020-09-21 16:00:00'), pd.Timestamp('2020-09-21 17:00:00'), pd.Timestamp('2020-09-22 01:00:00'), pd.Timestamp('2020-09-22 02:00:00'), pd.Timestamp('2020-09-22 03:00:00'), pd.Timestamp('2020-09-22 04:00:00'), pd.Timestamp('2020-09-22 05:00:00'), pd.Timestamp('2020-09-22 06:00:00'), pd.Timestamp('2020-09-22 07:00:00'), pd.Timestamp('2020-09-22 08:00:00'), pd.Timestamp('2020-09-22 09:00:00'), pd.Timestamp('2020-09-22 10:00:00'), pd.Timestamp('2020-09-22 11:00:00'), pd.Timestamp('2020-09-22 12:00:00'), pd.Timestamp('2020-09-22 13:00:00'), pd.Timestamp('2020-09-22 14:00:00'), pd.Timestamp('2020-09-22 15:00:00'), pd.Timestamp('2020-09-22 16:00:00'), pd.Timestamp('2020-09-22 17:00:00'), pd.Timestamp('2020-09-23 00:00:00'), pd.Timestamp('2020-09-23 01:00:00'), pd.Timestamp('2020-09-23 02:00:00'), pd.Timestamp('2020-09-23 03:00:00'), pd.Timestamp('2020-09-23 04:00:00'), pd.Timestamp('2020-09-23 05:00:00'), pd.Timestamp('2020-09-23 06:00:00'), pd.Timestamp('2020-09-23 07:00:00'), pd.Timestamp('2020-09-23 08:00:00'), pd.Timestamp('2020-09-23 09:00:00'), pd.Timestamp('2020-09-23 10:00:00'), pd.Timestamp('2020-09-23 11:00:00'), pd.Timestamp('2020-09-23 12:00:00'), pd.Timestamp('2020-09-23 13:00:00'), pd.Timestamp('2020-09-23 14:00:00'), pd.Timestamp('2020-09-23 15:00:00'), pd.Timestamp('2020-09-23 16:00:00'), pd.Timestamp('2020-09-23 17:00:00'), pd.Timestamp('2020-09-24 01:00:00'), pd.Timestamp('2020-09-24 02:00:00'), pd.Timestamp('2020-09-24 03:00:00'), pd.Timestamp('2020-09-24 04:00:00'), pd.Timestamp('2020-09-24 05:00:00'), pd.Timestamp('2020-09-24 06:00:00'), pd.Timestamp('2020-09-24 07:00:00'), pd.Timestamp('2020-09-24 08:00:00'), pd.Timestamp('2020-09-24 09:00:00'), pd.Timestamp('2020-09-24 10:00:00'), pd.Timestamp('2020-09-24 11:00:00'), pd.Timestamp('2020-09-24 12:00:00'), pd.Timestamp('2020-09-24 13:00:00'), pd.Timestamp('2020-09-24 14:00:00'), pd.Timestamp('2020-09-24 15:00:00'), pd.Timestamp('2020-09-24 16:00:00'), pd.Timestamp('2020-09-24 17:00:00'), pd.Timestamp('2020-09-25 00:00:00'), pd.Timestamp('2020-09-25 01:00:00'), pd.Timestamp('2020-09-25 02:00:00'), pd.Timestamp('2020-09-25 03:00:00'), pd.Timestamp('2020-09-25 04:00:00'), pd.Timestamp('2020-09-25 05:00:00'), pd.Timestamp('2020-09-25 06:00:00'), pd.Timestamp('2020-09-25 07:00:00'), pd.Timestamp('2020-09-25 08:00:00'), pd.Timestamp('2020-09-25 09:00:00'), pd.Timestamp('2020-09-25 10:00:00'), pd.Timestamp('2020-09-25 11:00:00'), pd.Timestamp('2020-09-25 12:00:00'), pd.Timestamp('2020-09-25 13:00:00'), pd.Timestamp('2020-09-25 14:00:00')],
          'SpeedKbs': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1088.48, 58282.31, 83008.37, 58044.14, 34211.61, 27468.72, 25756.96, 14090.29, 5392.43, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1008.33, 44002.72, 47254.5, 37419.96, 23934.41, 19402.93, 18192.84, 9040.67, 3842.37, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1241.15, 43260.7, 56718.99, 41968.16, 33144.51, 22361.08, 28672.93, 21182.31, 5352.42, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 946.01, 46169.63, 51720.39, 37393.39, 27732.89, 25779.79, 24790.86, 15786.72, 4202.65, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 871.7, 37196.78, 40910.71, 26758.97, 17710.98, 16024.61, 15312.96, 9529.89]}

from statsmodels.tsa.seasonal import seasonal_decompose

seasonal_decompose(pd.DataFrame(sample).set_index("EventTime"), model='additive', period=1).plot();

This is hourly data, which has a daily pattern.这是每小时数据,具有每日模式。 Therefore, the frequency needs to be set to 24. Setting the frequency to 1 is essentially not doing seasonalization at all.因此,需要将频率设置为24,将频率设置为1本质上是根本不做季节性化。

seasonal_decompose(pd.DataFrame(sample).set_index("EventTime"), model='additive', period=24).plot();

Here's what the output of that looks like:这是输出的样子:

绘图输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM