简体   繁体   中英

How to I train a model for a certain number of rows in python?

I am creating a model and have 1876 rows. I would like to use 1000 of them for training and 876 of them for testing. I am unsure how to do this, currently, I have the following:

df_train, df_test = train_test_split(df, train_size=0.80, test_size=0.20, shuffle=False)

This is by the percentage which is not the goal currently but it is all I know how to do. Does anyone have suggestions of how I could change this code to get exactly 1000 training and 876 testing rows? I understand that this is likely a sub-optimal split. Thank you in advance!

You can use

df_train = df.iloc[:1000]
df_test = df.iloc[1000:]

As @blorgon wrote you can use integers instead of percentages but also Pandas has the sample function, which samples n items randomly (so you can choose n=1000) here is the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

test_size=0.20 means 20% of 1876 will result in the test data. use test_size=.46

 1876*.46=1876-862=1014 (test size)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM