![](/img/trans.png)
[英]Split data set into train and test for time series analysis in python
[英]Split time series data into Train Test and Valid sets in Python
我正在做一个项目,在这个项目中,如果时间序列(例如 D1、D2),我结合了 2 个数据集。 D1
是5-minutes
间隔, D2
是1-minute
间隔,所以我将D1
转换为 1 分钟间隔并与D2
结合。 现在我想根据这些条件将这个新数据集D1D2
拆分为训练集、测试集和有效集:
注意:我进行了很多搜索并尝试找到解决我的问题的方法,但没有任何答案适合我的问题,所以请不要将其标记为重复!
valid set
的最新值这是我现在进行拆分的方式:
def split_train_test(dataset, train_size, test_size):
train = dataset[:train_size, :]
test = dataset[test_size:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape)
return train, test, train_X, train_y, test_X, test_y
但是现在我需要在上述条件的基础上转换成训练、测试和拆分?
我怎样才能做到这一点? 而且它是分割时间序列数据集的正确方法吗?
尝试这个:
valid_set = dataset.iloc[-60:, :]
test_set = dataset.iloc[-120:-60]
train_set = dataset.iloc[:-120]
概括:
def split_train_test(dataset, validation_size):
valid = dataset.iloc[-validation_size:, :]
train_test = dataset.iloc[:-validation_size)]
train_length = int(0.63 * len(train_test))
# split into input and outputs
train_X, train_y = train_test.iloc[:train_length, :-1], train_test.iloc[:train_length, -1]
test_X, test_y = train_test.iloc[train_length:, :-1], train_test.iloc[train_length:, -1]
valid_X, valid_y = valid.iloc[:, :-1], valid.iloc[:, -1]
return train_test, valid, train_X, train_y, test_X, test_y, valid_X, valid_y
您可以将 % 拆分率作为参数传递到 function 中,而不是像我一样将其硬编码到 function 中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.