[英]The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2
I am getting following error when I am trying to split data into train and test. 当我尝试将数据拆分为训练和测试时,出现以下错误。 I know that error is occurring because for stratify parameter I should pass categorical data only, not numeric but here
OFFENSE_CODE
is like a category except the categories in it are represented by number. 我知道发生错误是因为对于分层参数,我应该只传递分类数据,而不是数字,但是这里
OFFENSE_CODE
就像一个类别,只是其中的类别由数字表示。 So how can I do stratify sampling by OFFENSE_CODE
. 因此,如何通过
OFFENSE_CODE
进行分层抽样。
x = df.loc[:,['YEAR','MONTH','DAY_OF_WEEK']]
X_train, x_test, Y_train, y_test = model_selection.train_test_split(x,df['OFFENSE_CODE'],stratify=df['OFFENSE_CODE'],random_state=2,test_size=0.3)
this is sample of dataset 这是数据集的样本
INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GROUP \
I192067438 613 Larceny
I192067437 3831 Motor Vehicle Accident Response
I192067435 3115 Investigate Person
I192067434 3301 Verbal Disputes
I192067433 3301 Verbal Disputes
OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING \
LARCENY SHOPLIFTING A1 112 NaN
PROPERTY DAMAGE A1 NaN
INVESTIGATE PERSON C11 336 NaN
VERBAL DISPUTE E18 492 NaN
VERBAL DISPUTE D14 769 NaN
OCCURRED_ON_DATE YEAR MONTH DAY_OF_WEEK HOUR UCR_PART \
2019-08-25 19:55:02 2019 8 Sunday 19 Part One
2019-08-25 18:20:00 2019 8 Sunday 18 Part Three
2019-08-25 20:45:00 2019 8 Sunday 20 Part Three
2019-08-25 20:32:00 2019 8 Sunday 20 Part Three
2019-08-25 20:30:00 2019 8 Sunday 20 Part Three
STREET Lat Long Location CODES
WASHINGTON ST 42.355123 -71.060880 (42.35512339, -71.06087980) tyer613a
NaN 42.352389 -71.062603 (42.35238871, -71.06260312) tyer3831a
NORTON ST 42.306265 -71.068646 (42.30626521, -71.06864556) tyer3115a
DERRY RD 42.265933 -71.113774 (42.26593347, -71.11377415) tyer3301a
PARSONS ST NaN NaN (0.00000000, 0.00000000) tyer3301a
i also tried 我也尝试过
y = df.loc['OFFENSE_CODE'].apply(str)
X_train, x_test, Y_train, y_test = model_selection.train_test_split(x,y,stratify=y,random_state=2,test_size=0.3)
it is giving same error 它给出了同样的错误
ValueError:The least populated class in y has only 1 member, which is too few. ValueError:y中人口最少的类只有1个成员,这太少了。 The minimum number of groups for any class cannot be less than 2.
任何类别的最小组数不能少于2。
convert the column to string and then do the sampling 将列转换为字符串,然后进行采样
df['OFFENSE_CODE'].apply(str)
Dont foget to assign the result back 不要担心将结果分配回去
df['OFFENSE_CODE'] = df['OFFENSE_CODE'].apply(str)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.