简体   繁体   English

Scikit-learn(SVC估算器)始终为预测提供相同的值

[英]Scikit-learn (SVC estimator) always give the same value for predictions

I am doing a task on supervised learning. 我正在做监督学习的任务。 I have two set of data -training and test. 我有两组数据-培训和测试。

My training data-set is about 2000 records. 我的训练数据集大约有2000条记录。 My test data has 10 records. 我的测试数据有10条记录。
When I run the following code, the predict function gives the same value as output. 当我运行以下代码时, 预测函数给出的值与输出相同。

I am not sure what I am doing wrong... I tried changing the value of gamma and C. Still no luck.. 我不确定自己在做什么错...我尝试更改gamma和C的值。仍然没有运气。

I am wondering if this: 我想知道这是否:

  1. has to do with the data (training set size) or 与数据(训练集大小)有关或
  2. am I just using the wrong estimator or 我是使用错误的估算器还是
  3. my code is messed-up? 我的代码搞砸了吗?

Here is the complete python code: 这是完整的python代码:

    import pandas as pd

    training_data = pd.read_csv("Train_wo_Header.csv") #I read my training data set
    data = training_data.ix[:,[0,1,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]] #picking up all rows expect index 2, which is my output
    target = training_data.ix[:,[2]] 


    from sklearn import svm #Code from the URL above
    clf = svm.SVC(gamma=0.001, C=100.)
    clf.fit(data,target)  

    test_data = pd.read_csv("test_wo_Header.csv") #this is my test data

    clf.predict(test_data[-10:]) #predicting the last 10 values

Here is the output: 这是输出:

array([7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734, 7734], dtype=int64)

I even tried using LinearSVC. 我什至尝试使用LinearSVC。 Still no luck. 仍然没有运气。 The only difference is that the predicted output is a different value (9240), but same throughout ... 唯一的区别是预测的输出是一个不同的值(9240),但在整个...

Always giving the same output can have 2 causes : 始终给出相同的输出可能有两个原因:

  • your model is overfitting (unbalanced dataset ?) 您的模型拟合过度(数据集不平衡?)
  • you're not giving the correct data to your model 您没有为模型提供正确的数据

You didn't seem to convert your Pandas DataFrame to a numpy array, try 您似乎没有将Pandas DataFrame转换为numpy数组,请尝试

clf = svm.SVC()    
X = data.values
Y = target.values
assert len(X) == len(Y)

clf.fit(X,Y)
print clf.score(X,Y)

Do the same for your test data and try to print at least the shape of your data and one element of your arrays. 对测试数据执行相同的操作,并尝试至少打印数据的形状和数组的一个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM