简体   繁体   English

y中人口最少的类只有1个成员,这太少了。 任何班级的最小团体人数不得少于2

[英]The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2

I am getting following error when I am trying to split data into train and test. 当我尝试将数据拆分为训练和测试时,出现以下错误。 I know that error is occurring because for stratify parameter I should pass categorical data only, not numeric but here OFFENSE_CODE is like a category except the categories in it are represented by number. 我知道发生错误是因为对于分层参数,我应该只传递分类数据,而不是数字,但是这里OFFENSE_CODE就像一个类别,只是其中的类别由数字表示。 So how can I do stratify sampling by OFFENSE_CODE . 因此,如何通过OFFENSE_CODE进行分层抽样。

x = df.loc[:,['YEAR','MONTH','DAY_OF_WEEK']]
X_train, x_test, Y_train, y_test = model_selection.train_test_split(x,df['OFFENSE_CODE'],stratify=df['OFFENSE_CODE'],random_state=2,test_size=0.3)

this is sample of dataset 这是数据集的样本

INCIDENT_NUMBER  OFFENSE_CODE               OFFENSE_CODE_GROUP  \
  I192067438           613                          Larceny   
  I192067437          3831  Motor Vehicle Accident Response   
  I192067435          3115               Investigate Person   
  I192067434          3301                  Verbal Disputes   
  I192067433          3301                  Verbal Disputes   

                 OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING  \
                LARCENY SHOPLIFTING       A1            112      NaN   
                    PROPERTY DAMAGE       A1                     NaN   
                 INVESTIGATE PERSON      C11            336      NaN   
                     VERBAL DISPUTE      E18            492      NaN   
                     VERBAL DISPUTE      D14            769      NaN   

  OCCURRED_ON_DATE  YEAR  MONTH DAY_OF_WEEK  HOUR    UCR_PART  \
2019-08-25 19:55:02  2019      8      Sunday    19    Part One   
2019-08-25 18:20:00  2019      8      Sunday    18  Part Three   
2019-08-25 20:45:00  2019      8      Sunday    20  Part Three   
2019-08-25 20:32:00  2019      8      Sunday    20  Part Three   
2019-08-25 20:30:00  2019      8      Sunday    20  Part Three   

      STREET        Lat       Long                     Location      CODES  
WASHINGTON ST  42.355123 -71.060880  (42.35512339, -71.06087980)   tyer613a  
        NaN  42.352389 -71.062603  (42.35238871, -71.06260312)  tyer3831a  
  NORTON ST  42.306265 -71.068646  (42.30626521, -71.06864556)  tyer3115a  
   DERRY RD  42.265933 -71.113774  (42.26593347, -71.11377415)  tyer3301a  
 PARSONS ST        NaN        NaN     (0.00000000, 0.00000000)  tyer3301a

i also tried 我也尝试过

y = df.loc['OFFENSE_CODE'].apply(str)

X_train, x_test, Y_train, y_test = model_selection.train_test_split(x,y,stratify=y,random_state=2,test_size=0.3)

it is giving same error 它给出了同样的错误

ValueError:The least populated class in y has only 1 member, which is too few. ValueError:y中人口最少的类只有1个成员,这太少了。 The minimum number of groups for any class cannot be less than 2. 任何类别的最小组数不能少于2。

convert the column to string and then do the sampling 将列转换为字符串,然后进行采样

df['OFFENSE_CODE'].apply(str)

Dont foget to assign the result back 不要担心将结果分配回去

df['OFFENSE_CODE'] = df['OFFENSE_CODE'].apply(str)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 y 中人口最少的类只有 1 个成员,太少了。 任何班级的最少组数不能少于 2 个。怎么办 - The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. what to do “y 中人口最少的班级只有 1 个……任何班级的组不能少于 2 个。” 没有 train_test_split() - "The least populated class in y has only 1 ... groups for any class cannot be less than 2." Without train_test_split() `ValueError: y 中最少填充的类只有 1 个成员,这在 PyCaret 中太少了` - `ValueError: The least populated class in y has only 1 member, which is too few` in PyCaret 连续值目标的分层 5 折交叉验证 y 中人口最少的 class 只有 1 个成员,太少了 - stratified 5-fold cross validation for continuous-value taregt The least populated class in y has only 1 member, which is too few XgBoost:y中人口最少的类只有1个成员,这太少了 - XgBoost : The least populated class in y has only 1 members, which is too few ValueError 任何类的最小组数不能小于 2 - ValueError The minimum number of groups for any class cannot be less than 2 sklearn交叉验证:y中人口最少的class只有1个成员,小于n_splits=10 - sklearn cross validation : The least populated class in y has only 1 members, which is less than n_splits=10 Scikit-learn:“y 中人口最少的班级只有 1 个成员” - Scikit-learn: "The least populated class in y has only 1 member" 如何修复“y 中人口最少的班级只有一个成员” Scikit 学习 - How to fix "The least populated class in y has only one member" Scikit learn scikit-learn 错误:y 中人口最少的类只有 1 个成员 - scikit-learn error: The least populated class in y has only 1 member
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM