sklearn.pipeline 如何手动工作？

Question

Currently, I am working on the sklearn.pipeline which is just wonderful Here is an example:目前，我正在研究 sklearn.pipeline，这是一个很好的例子：

model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(train.data, train.target)
labels = model.predict(test.data)

(*data is from train = fetch_20newsgroups(subset='train', categories=categories )) with categories= ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics'] (*数据来自train = fetch_20newsgroups(subset='train', categories=categories )) categories= ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']

However, my understanding is just still very vague.但是，我的理解还很模糊。 I would like to ask that if we do it step by step without pipeline how it could be.我想问一下，如果我们在没有管道的情况下按部就班地进行，那会怎样。 Here is just what I am trying to do but it failed.这正是我想要做的，但它失败了。

from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)`

from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)

from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB
model2.fit(....)

At this far, I just don't know what to do next because the shape of X is not suitable for model2 .到目前为止，我只是不知道下一步该怎么做，因为X的形状不适合model2 。

For your further information of this, go to the book from this link at page (406/548)有关这方面的更多信息，请从第 (406/548) 页的此链接转到该书

*** Please pardon for my silly question. ***请原谅我的愚蠢问题。 I know I can do it by using pipeline but just want to try我知道我可以通过使用管道来做到这一点，但只是想尝试一下

Answer 1

You are almost there!你快到了！ you need to use MultinomialNB() instead of MultinomialNB .您需要使用MultinomialNB()而不是MultinomialNB 。

Try the following procedure.请尝试以下过程。

from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)


from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)

from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB()
model2.fit(X, train.target)
model2.predict(model1.transform(test.data))

# array([2, 1, 1, ..., 2, 1, 1])

sklearn.pipeline 如何手动工作？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-06-28 12:06:20

sklearn.pipeline 如何手动工作？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-06-28 12:06:20

解决方案1
2 已采纳 2019-06-28 12:06:20