[英]Pandas and scikit-learn - train_test_split dimensions of X, y
I have a pandas datafrane with the following info: 我有一个带有以下信息的pandas datafrane:
RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns)
RangeIndex:920个条目,0到919个数据列(共41列)
X = df[df.columns[:-1]]
Y = df['my_Target']
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)
The last column is the target, and the rest is the data. 最后一列是目标,其余是数据。 The shape is the following:
形状如下:
print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)
(616, 40) (304, 40) (616,) (304,)
(616,40)(304,40)(616,)(304,)
However when I train a model: 但是,当我训练模型时:
model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))
it gives the following error: 它给出以下错误:
model.fit(train_X,train_Y)
model.fit(train_X,train_Y)
ValueError: Found input variables with inconsistent numbers of samples: [616, 2]
ValueError:找到的输入变量样本数量不一致:[616,2]
Anyone got a hint about what is going on? 任何人都知道发生了什么事吗?
Your variables are in the wrong order: 您的变量顺序错误:
X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
Per docs 每个文档
X_train then X_test then y_train and then y_test X_train然后X_test然后y_train然后y_test
You have: 你有:
train_X,train_y,test_X, test_y train_X,train_y,test_X,test_y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.