如何修复“y 中人口最少的班级只有一个成员” Scikit 学习

Question

I am creating a program using past datasets to predict an employees salary for any job.我正在使用过去的数据集创建一个程序来预测任何工作的员工薪水。 I recieve the error "Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=5."我收到错误“警告：y 中人口最少的类只有 1 个成员，太少了。任何类中的最小成员数不能小于 n_splits=5。”

p_train, p_test, t_train, t_test = train_test_split(predictors, target target, test_size=0.25, random_state=1)
model = KNeighborsClassifier()
param_grid = {'n_neighbors': np.arange(1, 25)}
modelGSCV = GridSearchCV(model, param_grid, cv=5)

Here is where I tried splitting and received the error.这是我尝试拆分并收到错误的地方。 I am pretty new to Machine Learning so would appreciate if anyone could guide me on how to fix this.我对机器学习很陌生，所以如果有人能指导我如何解决这个问题，我将不胜感激。

Answer 1

From the GridSearchCV documentation:从GridSearchCV文档：

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used.对于整数/无输入，如果估计器是分类器并且 y 是二元或多类，则使用 StratifiedKFold。 In all other cases, KFold is used.在所有其他情况下，使用 KFold。

You must have a multiclass classification problem.您必须有一个多类分类问题。 Since StratifiedKFold is used, you need to have at least 5 examples of each class in your data.由于使用了StratifiedKFold ，因此您的数据中每个类至少需要有 5 个示例。 If you have at least one class with < 5 examples, this error will be thrown.如果您至少有一个类的示例少于 5 个，则会抛出此错误。

A simple solution would be to drop classes with < 5 examples or to reduce the number of folds.一个简单的解决方案是删除少于 5 个示例的类或减少折叠次数。

如何修复“y 中人口最少的班级只有一个成员” Scikit 学习

问题描述

1 个解决方案

解决方案1
0 2019-07-11 18:25:21

如何修复“y 中人口最少的班级只有一个成员” Scikit 学习

问题描述

1 个解决方案

解决方案1 0 2019-07-11 18:25:21

解决方案1
0 2019-07-11 18:25:21