简体   繁体   English

使用 sci-kit 学习的多类多输出回归

[英]multi-class multi-output regression using sci-kit learn

I am attempting to use sci-kit learn to develop a Machine Learning program which predicts 9 outputs from 5 inputs but am having trouble.我正在尝试使用 sci-kit learn 开发一个机器学习程序,该程序可以从 5 个输入中预测 9 个输出,但遇到了麻烦。

I have acquired 20,000 instances of the 5 inputs with corresponding 9 outputs for training purposes.我已经获得了 5 个输入的 20,000 个实例以及相应的 9 个输出,用于训练目的。 The inputs represent the performance measurements of an amplifier.输入代表放大器的性能测量。 The outputs represent the component sizes which give those performance measurements.输出代表给出这些性能测量的组件大小。

So one row of input variables X may be: [ 8430, 6895, 12735, 208929613, 249]所以一行输入变量 X 可能是:[ 8430, 6895, 12735, 208929613, 249]

With the corresponding output variables y: [1000, 400, 1000, 2000, 2500, 1000, 80, 1000, 2000]与对应的 output 变量 y:[1000, 400, 1000, 2000, 2500, 1000, 80, 1000, 2000]

After importing all the relevant libraries and assigning the inputs to X and outputs to y, I attempt to create the model as follows:在导入所有相关库并将输入分配给 X 并将输出分配给 y 之后,我尝试按如下方式创建 model:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train = X_train.values
X_test = X_test.values
y_train = y_train.values
y_test= y_test.values

model = DecisionTreeRegressor()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

After running this code however, I get the following error:但是,运行此代码后,我收到以下错误:

ValueError: multiclass-multioutput is not supported

But reading the SKlearn website says that Decision Trees are inherently multiclass?但是阅读 SKlearn 网站说决策树本质上是多类的? How should I proceed to fix this error?我应该如何着手修复这个错误? Or is SKlearn not suitable to this kind of problem?还是SKlearn不适合这种问题? Should I investigate using a neural network instead?我应该改为使用神经网络进行调查吗?

Decision trees are multiclass means that they can deal with a data where the differents samples belong to several classes not that each sample belongs to different classes.决策树是多类的,这意味着它们可以处理不同样本属于多个类的数据,而不是每个样本属于不同的类。 you could still implement youre own decision tree and adapt it to your problem by choosing features while building youre tree relatively to the average information gain over the different classes and it would solve the problem.您仍然可以实现自己的决策树,并通过选择特征来适应您的问题,同时相对于不同类的平均信息增益构建您的树,这样就可以解决问题。 You could also more simply use a basic but different decision tree for each of the different labels while passing to each tree only 1 label per sample.您还可以更简单地为每个不同的标签使用基本但不同的决策树,同时每个样本仅传递 1 个 label 到每个树。 You can also adapt decision trees in many other ways to youre problem or use NN that would be more natural but probably less effective if the data is well structured.您还可以通过许多其他方式调整决策树以适应您的问题,或者使用更自然但如果数据结构良好但可能不太有效的 NN。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM