[英]How to fix "The least populated class in y has only one member" Scikit learn
I am creating a program using past datasets to predict an employees salary for any job.我正在使用过去的数据集创建一个程序来预测任何工作的员工薪水。 I recieve the error "Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=5."
我收到错误“警告:y 中人口最少的类只有 1 个成员,太少了。任何类中的最小成员数不能小于 n_splits=5。”
p_train, p_test, t_train, t_test = train_test_split(predictors, target target, test_size=0.25, random_state=1)
model = KNeighborsClassifier()
param_grid = {'n_neighbors': np.arange(1, 25)}
modelGSCV = GridSearchCV(model, param_grid, cv=5)
Here is where I tried splitting and received the error.这是我尝试拆分并收到错误的地方。 I am pretty new to Machine Learning so would appreciate if anyone could guide me on how to fix this.
我对机器学习很陌生,所以如果有人能指导我如何解决这个问题,我将不胜感激。
From the GridSearchCV documentation:从GridSearchCV文档:
For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used.
对于整数/无输入,如果估计器是分类器并且 y 是二元或多类,则使用 StratifiedKFold。 In all other cases, KFold is used.
在所有其他情况下,使用 KFold。
You must have a multiclass classification problem.您必须有一个多类分类问题。 Since StratifiedKFold is used, you need to have at least 5 examples of each class in your data.
由于使用了StratifiedKFold ,因此您的数据中每个类至少需要有 5 个示例。 If you have at least one class with < 5 examples, this error will be thrown.
如果您至少有一个类的示例少于 5 个,则会抛出此错误。
A simple solution would be to drop classes with < 5 examples or to reduce the number of folds.一个简单的解决方案是删除少于 5 个示例的类或减少折叠次数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.