[英]How to use the imbalanced library with sklearn pipeline?
I am trying to solve a text classification problem. 我正在尝试解决文本分类问题。 I want to create baseline model using
MultinomialNB
我想使用
MultinomialNB
创建基线模型
my data is highly imbalnced for few categories, hence decided to use the imbalanced library with sklearn pipeline and referring the tutorial . 我的数据在少数类别中是高度不平衡的,因此决定将不平衡库与sklearn管道一起使用,并参考本教程 。
The model is failing and giving error after introducing the two stages in pipeline as suggested in docs. 按照文档中的建议在管道中引入了两个阶段之后,该模型失败并给出了错误。
from imblearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from imblearn.under_sampling import (EditedNearestNeighbours,
RepeatedEditedNearestNeighbours)
# Create the samplers
enn = EditedNearestNeighbours()
renn = RepeatedEditedNearestNeighbours()
pipe = make_pipeline_imb([('vect', CountVectorizer(max_features=100000,\
ngram_range= (1, 2),tokenizer=tokenize_and_stem)),\
('tfidf', TfidfTransformer(use_idf= True)),\
('enn', EditedNearestNeighbours()),\
('renn', RepeatedEditedNearestNeighbours()),\
('clf-gnb', MultinomialNB()),])
Error: 错误:
TypeError: Last step of Pipeline should implement fit. '[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
Can someone please help here. 有人可以帮忙吗? I am also open to use different way of (Boosting/SMOTE) implementation as well ?
我也愿意使用(Boosting / SMOTE)实现的不同方式吗?
It seems that the pipeline from ìmblearn doesn't support naming like the one in sklearn. 似乎来自“ mblearn”的管道不支持像sklearn中的命名一样。 From imblearn documentation :
从imblearn文档中 :
*steps : list of estimators.
* steps:估算器列表。
You should modify your code to : 您应该将代码修改为:
pipe = make_pipeline_imb( CountVectorizer(max_features=100000,\
ngram_range= (1, 2),tokenizer=tokenize_and_stem),\
TfidfTransformer(use_idf= True),\
EditedNearestNeighbours(),\
RepeatedEditedNearestNeighbours(),\
MultinomialNB())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.