简体   繁体   English

训练和测试拆分标记

[英]Training and Testing Splitting Tagging

Here is my dataset:这是我的数据集:

import pandas as pd 
fileName= 'user.csv'
df = pd.read_csv("trainingsample_100k_apps.csv",sep=",",header=0)
pd.set_option('max_columns', None)
df

UserID  Total Usage
001       20.3
002       40.5
003       10.1

How can I know which row have been selected as training and testing after I applied this query:应用此查询后,如何知道哪一行已被选为训练和测试:

train, test = train_test_split(df, test_size = 0.20)

I want the output to be like this:我希望 output 是这样的:

UserID  Total Usage   SplitingCategory
001       20.3        Training
002       40.5        Testing
003       10.1        Training

Ok i got the answer好的,我得到了答案

test

and it will appear my test dataset它会出现我的测试数据集

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 随机分配训练和测试数据 - Randomly splitting training and testing data 将文件夹拆分为训练集和测试集 - Splitting folders into training and testing set 分割数据集以逐行训练和测试 - Splitting dataset for training and testing row wise 是否有python函数将数据集分为训练,验证和测试? - Is there an python function for splitting the dataset into training, validation and testing? 随机分割数据以进行此功能的训练和测试 - Randomize the splitting of data for training and testing for this function 有条件地将数据拆分为训练和测试(Pandas) - Conditional splitting the data into training and testing (Pandas) Tensorflow 将数据集拆分为训练和测试导致瓶颈/缓慢 - Tensorflow splitting dataset into training and testing causes bottleneck/slow 制作 Keras model 时将数据拆分为训练、测试和评估 - Splitting data to training, testing and valuation when making Keras model 我将数据拆分为测试集和训练集,错误是“找到样本数量不一致的输入变量:[1000, 23486]” - i am splitting the data into testing and training set, the error is 'Found input variables with inconsistent number of samples: [1000, 23486]' TypeError:级别类型不匹配:0.2。 将数据分为训练,验证和测试集时 - TypeError: Level type mismatch: 0.2. When splitting data into training, validating and testing sets
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM