简体   繁体   English


[英]Machine Learning - Stratified K-Fold CV

I've unbalanced binary classifier data and want to Stratified K-Fold CV. 我的二进制分类器数据不平衡,想要分层K折CV。 I'm getting the below error: 我收到以下错误:

data = DataFrame(df,columns=names)
train,test = cross_validation.train_test_split(df,test_size=0.20)
train_data,test_data = pd.DataFrame(train,columns=names),pd.DataFrame(test,columns=names)
y = test_data['Classifier'].values
k_fold = StratifiedKFold(y, n_folds=3, shuffle=False, random_state=None)
scores = []

for train_indices, test_indices in k_fold:
    train_text = train.iloc[train_indices]
    train_y = train.iloc[train_indices]
    test_text  = test.iloc[test_indices]
    test_y = test.iloc[test_indices]
    pipeline.fit(train_text, train_y)

Here, pipeline is: 在这里,管道是:

pipeline = Pipeline([
  ('count_vectorizer',   CountVectorizer(ngram_range=(1, 2))),
  ('tfidf_transformer',  TfidfTransformer()),
  ('classifier',         MultinomialNB()) ]) . The error is occurring in pipeline.Below is the error.
C:\SMS\Anaconda32bit\lib\site-packages\sklearn\utils\validation.pyc in column_or_1d(y, warn)
    549         return np.ravel(y)
   --> 551     raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (54, 3)

You are not passing valid labels , in fact in your code labels and data is the same thing: 您没有传递有效标签 ,实际上您的代码标签和数据是同一回事:

train_text = train.iloc[train_indices]
train_y = train.iloc[train_indices]

while probably you wanted something among the lines of 虽然您可能想要一些

train_y = y[train_indices]

and the same for test. 和测试相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM