简体   繁体   中英

Split train / test data

I need to divide a data set into training and validation data sets. I tried to do 80-20, but it doesn't meet my expectations.

train_dataset, test_dataset = train_test_split(df, test_size = 0.2, random_state = 0)

I have a city variable with 25 cities and several observations (at different dates) for each of them. What I want is that for each city and its observations, make an 80-20 division. I don't know if this method has a name, but I don't know how to go about it.

Thank you.

How about something like this?

import numpy as np
from pandas import DataFrame

def split_data(df: DataFrame, ratio: int): 
    length  = len(df)
    indices = list(range(length))
    np.random.shuffle(indices)

    train_indices = indices[:int(length * ratio)]
    test_indices  = indices[round(length * ratio):]

    train_set = df.iloc[train_indices]
    test_set  = df.iloc[test_indices]
    
    return (train_set, test_set)


train, test = split_data(df, 0.80)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM