多个 Output 机器学习 Model - Python

Question

Hello everyone I've tried searching this topic and haven't been able to find a good answer so I was hoping someone could help me out.大家好，我已经尝试搜索这个主题并且无法找到一个好的答案，所以我希望有人可以帮助我。 Let's say I am trying to create a ML model using scikit-learn and python.假设我正在尝试使用 scikit-learn 和 python 创建一个 ML model。 I have a data set as such:我有一个这样的数据集：

| Features | Topic   | Sub-Topic        |
|----------|---------|------------------|
| ...      | Science | Space            |
| ...      | Science | Engineering      |
| ...      | History | American History |
| ...      | History | European History |

My features list is composed of just text such as a small paragraph from some essay.我的功能列表仅由文本组成，例如一些文章中的一小段。 Now I want to be able to use ML to predict what the topic and sub-topic of that text will be.现在我希望能够使用 ML 来预测该文本的主题和子主题。

I know I would need to use some sort of NLP to analyze the text such as spaCy.我知道我需要使用某种 NLP 来分析诸如 spaCy 之类的文本。 The part where I am confused is on having two output variables: topic and sub-topic.我感到困惑的部分是有两个 output 变量：主题和子主题。 I've read that scikit-learn has something called MultiOutputClassifier, but then there is also something called MultiClass Classification so I'm just a little confused as to what route to take.我读过 scikit-learn 有一个叫做 MultiOutputClassifier 的东西，但是还有一个叫做 MultiClass Classification 的东西，所以我对采取什么路线有点困惑。

Could someone please point me in the right direction as to what regressor to use or how to achieve this?有人可以为我指出使用什么回归器或如何实现这一点的正确方向吗？

Answer 1

So MultiClass is just saying there are multiple classes in one target variable.所以 MultiClass 只是说一个目标变量中有多个类。 MultiOutput means we have more than one target variable. MultiOutput 意味着我们有多个目标变量。 Here we have a MultiClass-MultiOutput problem.这里我们有一个MultiClass-MultiOutput问题。

scikit-learn supports MultiClass-MultiOutput for the below classifier natively. scikit-learn 原生支持以下分类器的MultiClass-MultiOutput 。

sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neighbors.RadiusNeighborsClassifier
sklearn.ensemble.RandomForestClassifier

I'd suggest picking up RandomForest as most of the times it gives great results out of the box.我建议选择 RandomForest，因为大多数情况下它开箱即用，效果很好。

So to take a dummy example to demonstrate the api of RandomForestClassifier for multiple targets.所以举一个虚拟的例子来演示 RandomForestClassifier 的RandomForestClassifier用于多个目标。

### Dummy Example only to test functionality
np.random.seed(0)
X = np.random.randn(10,2)
y1 = (X[:,[0]]>.5).astype(int) # make dummy y1
y2 = (X[:,[1]]<.5).astype(int) # make dummy y2
y = np.hstack([y1,y2]) # y has 2 columns
print("X = ",X,sep="\n",end="\n\n")
print("y = ",y,sep="\n",end="\n\n")
rfc = RandomForestClassifier().fit(X, y) # use the same api for multi column y!
out = rfc.predict(X)
print("Output = ",out,sep="\n")

Output Output

X = 
[[ 1.76405235  0.40015721]
 [ 0.97873798  2.2408932 ]
 [ 1.86755799 -0.97727788]
 [ 0.95008842 -0.15135721]
 [-0.10321885  0.4105985 ]
 [ 0.14404357  1.45427351]
 [ 0.76103773  0.12167502]
 [ 0.44386323  0.33367433]
 [ 1.49407907 -0.20515826]
 [ 0.3130677  -0.85409574]]

y = 
[[1 1]
 [1 0]
 [1 1]
 [1 1]
 [0 1]
 [0 0]
 [1 1]
 [0 1]
 [1 1]
 [0 1]]

Output = 
[[1 1]
 [1 0]
 [1 1]
 [1 1]
 [0 1]
 [0 0]
 [1 1]
 [0 1]
 [1 1]
 [0 1]]

On a side note, as you are doing an NLP related model, I'd suggest using Keras's multi-output NN api to train a neural network for better outputs!在旁注中，当您正在执行与 NLP 相关的 model 时，我建议使用Keras 的多输出 NN api来训练神经网络以获得更好的输出！

多个 Output 机器学习 Model - Python

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-05 01:34:50

多个 Output 机器学习 Model - Python

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-05 01:34:50

解决方案1
1 已采纳 2019-11-05 01:34:50