简体   繁体   English

两个不同分辨率的时间序列之间的最大差异

[英]maximum difference between two time series of different resolution

I have two time series data that gives the electricity demand in one-hour resolution and five-minute resolution.我有两个时间序列数据,分别以一小时分辨率和五分钟分辨率给出电力需求。 I am trying to find the maximum difference between these two time series.我试图找到这两个时间序列之间的最大差异。 So the one-hour resolution data has 8760 rows (hourly for an year) and the 5-minute resolution data has 104,722 rows (5-minutly for an year).所以一小时分辨率数据有8760行(一年每小时),5分钟分辨率数据有104722行(一年5分钟)。

I can only think of a method that will expand the hourly data into 5 minute resolution that will have 12 times repeating of the hourly data and find the maximum of the difference of the two data sets.我只能想到一种方法,将每小时数据扩展为 5 分钟分辨率,将每小时数据重复 12 次,并找到两个数据集差异的最大值。

If this technique is the way to go, is there an easy way to convert my hourly data into 5-minute resolution by repeating the hourly data 12 times?如果这种技术是可行的方法,是否有一种简单的方法可以通过将每小时数据重复 12 次来将我的每小时数据转换为 5 分钟分辨率?

for your reference I posted a plot of this data for one day.为了您的参考,我发布了一天的数据图。

PS> I am using Python to do this task PS> 我正在使用 Python 来完成这个任务在此处输入图片说明

Numpy's .repeat() function Numpy 的 .repeat() 函数

You can change your hourly data into 5-minute data by using numpy's repeat function您可以使用 numpy 的重复功能将每小时数据更改为 5 分钟数据

import numpy as np

np.repeat(hourly_data, 12)

I would strongly recommend against converting the hourly data into five-minute data.我强烈建议不要将每小时数据转换为五分钟数据。 If the data in both cases refers to the mean load of those time ranges, you'll be looking at more accurate data if you group the five-minute intervals into hourly datasets.如果这两种情况下的数据都指的是这些时间范围的平均负载,那么如果您将五分钟间隔分组为每小时数据集,您将看到更准确的数据。 You'd get more granularity the way you're talking about, but the granularity is not based on accurate data, so you're not actually getting more value from it.你会以你所说的方式获得更多的粒度,但粒度不是基于准确的数据,所以你实际上并没有从中获得更多的价值。 If you aggregate the five-minute chunks into hourly chunks and compare the series that way, you can be more confident in the trustworthiness of your results.如果您将 5 分钟的数据块聚合为每小时的数据块并以这种方式比较系列,您可以对结果的可信度更有信心。

In order to group them together to get that result, you can define a function like the following and use the apply method like so:为了将它们组合在一起以获得该结果,您可以定义如下所示的函数并使用如下所示的 apply 方法:

def to_hour(date):
    date = date.strftime("%Y-%m-%d %H:00:00")
    date = dt.strptime(date, "%Y-%m-%d %H:%M:%S")
    return date

df['Aggregated_Datetime'] = df['Original_Datetime'].apply(lambda x: to_hour(x))
df.groupby('Aggregated_Datetime').agg('Real-Time Lo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM