Split train / test data

Question

I need to divide a data set into training and validation data sets. I tried to do 80-20, but it doesn't meet my expectations.

train_dataset, test_dataset = train_test_split(df, test_size = 0.2, random_state = 0)

I have a city variable with 25 cities and several observations (at different dates) for each of them. What I want is that for each city and its observations, make an 80-20 division. I don't know if this method has a name, but I don't know how to go about it.

Thank you.

Answer 1

How about something like this?

import numpy as np
from pandas import DataFrame

def split_data(df: DataFrame, ratio: int): 
    length  = len(df)
    indices = list(range(length))
    np.random.shuffle(indices)

    train_indices = indices[:int(length * ratio)]
    test_indices  = indices[round(length * ratio):]

    train_set = df.iloc[train_indices]
    test_set  = df.iloc[test_indices]
    
    return (train_set, test_set)


train, test = split_data(df, 0.80)

Split train / test data

Question

1 answers

solution1
0 2020-12-13 18:00:45

Split train / test data

Question

1 answers

solution1 0 2020-12-13 18:00:45

solution1
0 2020-12-13 18:00:45