I need to divide a data set into training and validation data sets. I tried to do 80-20, but it doesn't meet my expectations.
train_dataset, test_dataset = train_test_split(df, test_size = 0.2, random_state = 0)
I have a city variable with 25 cities and several observations (at different dates) for each of them. What I want is that for each city and its observations, make an 80-20 division. I don't know if this method has a name, but I don't know how to go about it.
Thank you.
How about something like this?
import numpy as np
from pandas import DataFrame
def split_data(df: DataFrame, ratio: int):
length = len(df)
indices = list(range(length))
np.random.shuffle(indices)
train_indices = indices[:int(length * ratio)]
test_indices = indices[round(length * ratio):]
train_set = df.iloc[train_indices]
test_set = df.iloc[test_indices]
return (train_set, test_set)
train, test = split_data(df, 0.80)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.