简体   繁体   English

sklearn.pipeline 如何手动工作?

[英]How sklearn.pipeline works, in manually?

Currently, I am working on the sklearn.pipeline which is just wonderful Here is an example:目前,我正在研究 sklearn.pipeline,这是一个很好的例子:

model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(train.data, train.target)
labels = model.predict(test.data)

(*data is from train = fetch_20newsgroups(subset='train', categories=categories )) with categories= ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics'] (*数据来自train = fetch_20newsgroups(subset='train', categories=categories )) categories= ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']

However, my understanding is just still very vague.但是,我的理解还很模糊。 I would like to ask that if we do it step by step without pipeline how it could be.我想问一下,如果我们在没有管道的情况下按部就班地进行,那会怎样。 Here is just what I am trying to do but it failed.这正是我想要做的,但它失败了。

from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)`

from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)

from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB
model2.fit(....)

At this far, I just don't know what to do next because the shape of X is not suitable for model2 .到目前为止,我只是不知道下一步该怎么做,因为X的形状不适合model2

For your further information of this, go to the book from this link at page (406/548)有关这方面的更多信息,请从第 (406/548) 页的链接转到该书

*** Please pardon for my silly question. ***请原谅我的愚蠢问题。 I know I can do it by using pipeline but just want to try我知道我可以通过使用管道来做到这一点,但只是想尝试一下

You are almost there!你快到了! you need to use MultinomialNB() instead of MultinomialNB .您需要使用MultinomialNB()而不是MultinomialNB

Try the following procedure.请尝试以下过程。

from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)


from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)

from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB()
model2.fit(X, train.target)
model2.predict(model1.transform(test.data))

# array([2, 1, 1, ..., 2, 1, 1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM