简体   繁体   English

Sci-kit学习SGD分类器Partial_Fit错误

[英]Sci-kit Learn SGD Classifier Partial_Fit Error

I'm using scikit-learn and the SGD classifier to train an SVM in mini-batches. 我正在使用scikit-learn和SGD分类器以小批量训练SVM。 Here's a little code snippet: 这是一个小代码片段:

for row in reader:
        if row[0] in model.docvecs:
            TRAINING_X.append(model.docvecs[row[0]])
            TRAINING_Y.append(row[2])
        if count % 10000 == 0:
            np_x = np.asarray(TRAINING_X)
            np_y = np.asarray(TRAINING_Y)
            clf.partial_fit(np_x,np_y, np.unique(np.asarray))
            TRAINING_X = []
            TRAINING_Y = []
        count += 1

I'm using the partial_fit function to read in every 1000 data points and using np.unique() to generate class labels as per the documentation . 我正在使用partial_fit函数读取每1000个数据点,并使用np.unique()根据文档生成类标签。

However, when I run this, I get the following error: 但是,当我运行它时,我收到以下错误:

raise ValueError("The number of class labels must be " ValueError: The number of class labels must be greater than one. raise ValueError(“类标签的数量必须为”ValueError:类标签的数量必须大于1。

I'm a little confused. 我有点困惑。 Am I generating class labels incorrectly? 我是否错误地生成了类标签?

The documentation for partial_fit says, Classes across all calls to partial_fit. Can be obtained by via np.unique(y_all), where y_all is the target vector of the entire dataset. partial_fit的文档说明了Classes across all calls to partial_fit. Can be obtained by via np.unique(y_all), where y_all is the target vector of the entire dataset. Classes across all calls to partial_fit. Can be obtained by via np.unique(y_all), where y_all is the target vector of the entire dataset. .

You seem to be passing np.unique(np.asarray) which does seem incorrect. 你似乎正在传递np.unique(np.asarray) ,这似乎不正确。

Going by the error thrown by the program, I think there is only one unique class in your target variable. 根据程序抛出的错误,我认为目标变量中只有一个唯一的类。 Please use np.unique(np_y) and get the number of unique classes that you are feeding into the model and ensure that it is more than one. 请使用np.unique(np_y)并获取要添加到模型中的唯一类的数量,并确保它不止一个。

Also, your value to the classes argument seem to be incorrect, it should have been np.unique(np_y) instead of np.unique(np.asarray) 另外,你对classes参数的值似乎是不正确的,应该是np.unique(np_y)而不是np.unique(np.asarray)

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM