简体   繁体   English

为深度学习增加时间序列数据

[英]Augmenting Time Series Data for Deep Learning

If I want to apply deep learning to the dataset from the sensors that I currently possess, I would require quite a lot data, or we may see overfitting.如果我想将深度学习应用到我目前拥有的传感器的数据集上,我需要大量的数据,否则我们可能会看到过度拟合。 Unfortunately, the sensors have only been active for a month and therefore the data requires augmentation.不幸的是,传感器只活跃了一个月,因此数据需要增加。 I currently have data in the form of a dataframe that can be seen below:我目前拥有数据框形式的数据,如下所示:

index   timestamp              cas_pre        fl_rat         ...
0       2017-04-06 11:25:00    687.982849     1627.040283    ...
1       2017-04-06 11:30:00    693.427673     1506.217285    ...
2       2017-04-06 11:35:00    692.686310     1537.114807    ...
....
101003  2017-04-06 11:35:00    692.686310     1537.114807    ...

Now I want to augment some particular columns with the tsaug package .现在我想用tsaug增加一些特定的列。 The augmentation can be in the form of:增强可以采用以下形式:

my_aug = (    
    RandomMagnify(max_zoom=1.2, min_zoom=0.8) * 2
    + RandomTimeWarp() * 2
    + RandomJitter(strength=0.1) @ 0.5
    + RandomTrend(min_anchor=-0.5, max_anchor=0.5) @ 0.5
)

The docs for the augmentation library proceed to use the augmentation in the manner below:扩充库的文档继续以下列方式使用扩充:

X_aug, Y_aug = my_aug.run(X, Y)

Upong further investigation on this site, it seems as though that the augmentation affects numpy arrays.站点上进一步调查后,似乎增强会影响 numpy 数组。 While it states that it is a multivariate augmentation not really sure as to how that is happening effectively.虽然它声明它是一种多元增强,但不确定它是如何有效地发生的。

I would like to apply this consistent augmentation across the float numerical columns such as cas_pre and fl_rat in order not to diverge from the original data and the relationships between each of the columns too much.我想在cas_prefl_rat等浮点数值列上应用这种一致的增强, cas_pre与原始数据和每列之间的关系偏离太多。 I would not like to appply it rows such as timestamp .我不想应用它的行,例如timestamp I am not sure as to how to do this within Pandas.我不确定如何在 Pandas 中做到这一点。

This is my attempt:这是我的尝试:

#Convert Pandas dataframe to Numpy array and apply tsaug transformations

import numpy as np
import pandas as pd
from tsaug import TimeWarp, Crop, Quantize, Drift, Reverse

df = pd.DataFrame({"timestamp": [1, 2],"cas_pre": [687.982849, 693.427673], "fl_rat": [1627.040283, 1506.217285]})

my_aug = (    
    Drift(max_drift=(0.1, 0.5))
)

aug = my_aug.augment(df[["timestamp","cas_pre","fl_rat"]].to_numpy())

print("Input:")
print(df[["timestamp","cas_pre","fl_rat"]].to_numpy()) #debug
print("Output:")
print(aug)

Console Output:控制台输出:

Input:
[[1.00000000e+00 6.87982849e+02 1.62704028e+03]
 [2.00000000e+00 6.93427673e+02 1.50621728e+03]]
Output:
[[1.00000000e+00 9.13389853e+02 2.03588979e+03]
 [2.00000000e+00 1.01536282e+03 1.43177109e+03]]

You may need to convert your timestamps to something numeric.您可能需要将时间戳转换为数字。

The tsaug functions you use don't seem to exist, so I only applied drift() as an example.您使用的 tsaug 函数似乎不存在,因此我仅应用了drift() 作为示例。 After some experimentation, TimeWarp() doesn't affect timestamps (Column 1) by default, but TimeWarp()*5 inserts new samples by cloning each timestamp 5 times.经过一些实验,默认情况下 TimeWarp() 不会影响时间戳(第 1 列),但 TimeWarp()*5 通过克隆每个时间戳 5 次来插入新样本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 时间序列数据的监督学习 - Supervised learning for time series data 时间序列数据的无监督学习 - Unsupervised Learning with Time Series Data 如何使用深度学习模型进行时间序列预测? - How to use deep learning models for time-series forecasting? 用于时间序列的深度学习 Keras 简单 RNN,预测多个 - Deep Learning Keras Simple RNN for Time Series, predict multiple R:为时间序列中的异常检测拟合深度学习模型时出错 - R: Error while fitting deep learning model for Anomaly Detection in Time Series 将时间序列数据集转换为监督学习数据集 - Transform time series data set to supervised learning data set 使用深度神经网络对时间序列数据进行 Keras 回归 model - Making Keras regression model with time series data with Deep Neural Network 重整数据以进行时间序列预测机器学习的有效方法(Numpy) - Efficient way to Reshape Data for Time Series Prediction Machine Learning (Numpy) 如何从python中的时间序列数据创建监督学习数据集 - How to create supervised learning dataset from time series data in python 在深度学习中对结构化数据进行聚类 - Clustering structured data in Deep learning
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM