Scikit-learn（SVC估算器）始终为预测提供相同的值

Question

I am doing a task on supervised learning. 我正在做监督学习的任务。 I have two set of data -training and test. 我有两组数据-培训和测试。

My training data-set is about 2000 records. 我的训练数据集大约有2000条记录。 My test data has 10 records. 我的测试数据有10条记录。
When I run the following code, the predict function gives the same value as output. 当我运行以下代码时， 预测函数给出的值与输出相同。

I am not sure what I am doing wrong... I tried changing the value of gamma and C. Still no luck.. 我不确定自己在做什么错...我尝试更改gamma和C的值。仍然没有运气。

I am wondering if this: 我想知道这是否：

has to do with the data (training set size) or 与数据（训练集大小）有关或
am I just using the wrong estimator or 我是使用错误的估算器还是
my code is messed-up? 我的代码搞砸了吗？

Here is the complete python code: 这是完整的python代码：

    import pandas as pd

    training_data = pd.read_csv("Train_wo_Header.csv") #I read my training data set
    data = training_data.ix[:,[0,1,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]] #picking up all rows expect index 2, which is my output
    target = training_data.ix[:,[2]] 


    from sklearn import svm #Code from the URL above
    clf = svm.SVC(gamma=0.001, C=100.)
    clf.fit(data,target)  

    test_data = pd.read_csv("test_wo_Header.csv") #this is my test data

    clf.predict(test_data[-10:]) #predicting the last 10 values

Here is the output: 这是输出：

array([7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734], dtype=int64)

I even tried using LinearSVC. 我什至尝试使用LinearSVC。 Still no luck. 仍然没有运气。 The only difference is that the predicted output is a different value (9240), but same throughout ... 唯一的区别是预测的输出是一个不同的值（9240），但在整个...

Answer 1

Always giving the same output can have 2 causes : 始终给出相同的输出可能有两个原因：

your model is overfitting (unbalanced dataset ?) 您的模型拟合过度（数据集不平衡？）
you're not giving the correct data to your model 您没有为模型提供正确的数据

You didn't seem to convert your Pandas DataFrame to a numpy array, try 您似乎没有将Pandas DataFrame转换为numpy数组，请尝试

clf = svm.SVC()    
X = data.values
Y = target.values
assert len(X) == len(Y)

clf.fit(X,Y)
print clf.score(X,Y)

Do the same for your test data and try to print at least the shape of your data and one element of your arrays. 对测试数据执行相同的操作，并尝试至少打印数据的形状和数组的一个元素。

Scikit-learn（SVC估算器）始终为预测提供相同的值

问题描述

1 个解决方案

解决方案1
2 2016-04-11 21:18:48

Scikit-learn（SVC估算器）始终为预测提供相同的值

问题描述

1 个解决方案

解决方案1 2 2016-04-11 21:18:48

解决方案1
2 2016-04-11 21:18:48