简体   繁体   English

熊猫和cikit学习-X,y的train_test_split尺寸

[英]Pandas and scikit-learn - train_test_split dimensions of X, y

I have a pandas datafrane with the following info: 我有一个带有以下信息的pandas datafrane:

RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns) RangeIndex:920个条目,0到919个数据列(共41列)

X = df[df.columns[:-1]]
Y = df['my_Target']   
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)

The last column is the target, and the rest is the data. 最后一列是目标,其余是数据。 The shape is the following: 形状如下:

print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)

(616, 40) (304, 40) (616,) (304,) (616,40)(304,40)(616,)(304,)

However when I train a model: 但是,当我训练模型时:

model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))

it gives the following error: 它给出以下错误:

model.fit(train_X,train_Y) model.fit(train_X,train_Y)

ValueError: Found input variables with inconsistent numbers of samples: [616, 2] ValueError:找到的输入变量样本数量不一致:[616,2]

Anyone got a hint about what is going on? 任何人都知道发生了什么事吗?

Your variables are in the wrong order: 您的变量顺序错误:

X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)

Per docs 每个文档
X_train then X_test then y_train and then y_test X_train然后X_test然后y_train然后y_test

You have: 你有:

train_X,train_y,test_X, test_y train_X,train_y,test_X,test_y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM