两个不同分辨率的时间序列之间的最大差异

Question

I have two time series data that gives the electricity demand in one-hour resolution and five-minute resolution.我有两个时间序列数据，分别以一小时分辨率和五分钟分辨率给出电力需求。 I am trying to find the maximum difference between these two time series.我试图找到这两个时间序列之间的最大差异。 So the one-hour resolution data has 8760 rows (hourly for an year) and the 5-minute resolution data has 104,722 rows (5-minutly for an year).所以一小时分辨率数据有8760行（一年每小时），5分钟分辨率数据有104722行（一年5分钟）。

I can only think of a method that will expand the hourly data into 5 minute resolution that will have 12 times repeating of the hourly data and find the maximum of the difference of the two data sets.我只能想到一种方法，将每小时数据扩展为 5 分钟分辨率，将每小时数据重复 12 次，并找到两个数据集差异的最大值。

If this technique is the way to go, is there an easy way to convert my hourly data into 5-minute resolution by repeating the hourly data 12 times?如果这种技术是可行的方法，是否有一种简单的方法可以通过将每小时数据重复 12 次来将我的每小时数据转换为 5 分钟分辨率？

for your reference I posted a plot of this data for one day.为了您的参考，我发布了一天的数据图。

PS> I am using Python to do this task PS> 我正在使用 Python 来完成这个任务

Answer 1

Numpy's .repeat() function Numpy 的 .repeat() 函数

You can change your hourly data into 5-minute data by using numpy's repeat function您可以使用 numpy 的重复功能将每小时数据更改为 5 分钟数据

import numpy as np

np.repeat(hourly_data, 12)

Answer 2

I would strongly recommend against converting the hourly data into five-minute data.我强烈建议不要将每小时数据转换为五分钟数据。 If the data in both cases refers to the mean load of those time ranges, you'll be looking at more accurate data if you group the five-minute intervals into hourly datasets.如果这两种情况下的数据都指的是这些时间范围的平均负载，那么如果您将五分钟间隔分组为每小时数据集，您将看到更准确的数据。 You'd get more granularity the way you're talking about, but the granularity is not based on accurate data, so you're not actually getting more value from it.你会以你所说的方式获得更多的粒度，但粒度不是基于准确的数据，所以你实际上并没有从中获得更多的价值。 If you aggregate the five-minute chunks into hourly chunks and compare the series that way, you can be more confident in the trustworthiness of your results.如果您将 5 分钟的数据块聚合为每小时的数据块并以这种方式比较系列，您可以对结果的可信度更有信心。

In order to group them together to get that result, you can define a function like the following and use the apply method like so:为了将它们组合在一起以获得该结果，您可以定义如下所示的函数并使用如下所示的 apply 方法：

def to_hour(date):
    date = date.strftime("%Y-%m-%d %H:00:00")
    date = dt.strptime(date, "%Y-%m-%d %H:%M:%S")
    return date

df['Aggregated_Datetime'] = df['Original_Datetime'].apply(lambda x: to_hour(x))
df.groupby('Aggregated_Datetime').agg('Real-Time Lo

两个不同分辨率的时间序列之间的最大差异

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-03-11 22:27:48

Numpy's .repeat() function Numpy 的 .repeat() 函数

解决方案2
0 2020-03-11 22:36:36

两个不同分辨率的时间序列之间的最大差异

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-03-11 22:27:48

Numpy's .repeat() function Numpy 的 .repeat() 函数

解决方案2 0 2020-03-11 22:36:36

解决方案1
0 已采纳 2020-03-11 22:27:48

解决方案2
0 2020-03-11 22:36:36