简体   繁体   中英

Split time series data into Train Test and Valid sets in Python

I'm working on a project in which I have combined 2 datasets if time series (eg D1, D2). D1 was with the 5-minutes interval and D2 was for the 1-minute interval, so I transformed the D1 to 1-minute interval and combine with the D2 . Now I want to split this new dataset D1D2 into train, test and valid sets on the base of these conditions:

Note: I have searched a lot and try to find a solution for my problem but couldn't any answer fit to my question, so don't mark this as duplicate, please!

  1. The valid set should be 60 values from the end of the dataset.
  2. Then, the test set should be the most recent values till to the valid set
  3. Then, I will have the train set with the remaining data.

Here's how I'm doing the split now:

def split_train_test(dataset, train_size, test_size):
    train = dataset[:train_size, :]
    test = dataset[test_size:, :]
    # split into input and outputs
    train_X, train_y = train[:, :-1], train[:, -1]
    test_X, test_y = test[:, :-1], test[:, -1]
    # reshape input to be 3D [samples, timesteps, features]
    train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
    test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
    print(train_X.shape, train_y.shape, test_X.shape)
    return train, test, train_X, train_y, test_X, test_y

But now I need to convert into train, test and split on the base of the above conditions?

How can I do that? and also is it the right way to split time-series datasets?

Try this:

valid_set = dataset.iloc[-60:, :]
test_set = dataset.iloc[-120:-60]
train_set = dataset.iloc[:-120]

to generalize:

def split_train_test(dataset, validation_size):
    valid = dataset.iloc[-validation_size:, :]
    train_test = dataset.iloc[:-validation_size)]

    train_length = int(0.63 * len(train_test))

    # split into input and outputs
    train_X, train_y = train_test.iloc[:train_length, :-1], train_test.iloc[:train_length, -1]
    test_X, test_y = train_test.iloc[train_length:, :-1], train_test.iloc[train_length:, -1]
    valid_X, valid_y = valid.iloc[:, :-1], valid.iloc[:, -1]

    return train_test, valid, train_X, train_y, test_X, test_y, valid_X, valid_y

You can pass the % split rati into the function as a parameter rather than hardcoding it into the function as I have.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM