简体   繁体   中英

Multi-target regression using scikit-learn

I am solving the classic regression problem using the python language and the scikit-learn library. It's simple:

        ml_model = GradientBoostingRegressor()
        ml_params = {}
        ml_model.fit(X_train, y_train)

where y_train is one-dimensional array-like object

Now I would like to expand the functionality of the task, to get not a single target value, but a set of them. Training set of samples X_train will remain the same. An intuitive solution to the problem is to train several models, where X_train for all of them will be the same but y_train for each model will be specific. This is definitely a working, but, it seems to me, inefficient solution.

When searching for alternatives, I came across such concepts as Multi-Target Regression. As I understand such functionality is not implemented in scikit-learn. How to solve Multi-Target Regression problem in python in efficient way? Thanks)

It depends on what problem you solve, training data you have, and an algorithm you choose to find a solution. It's really hard to suggest anything without knowing all the details. You could try a random forest as a starting point. It's a very powerful and robust algorithm which is resistant to overfitting in the case you have not so much data, and also it can be used for multi-target regression. Here is a working example:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor


X, y = make_regression(n_targets=2)
print('Feature vector:', X.shape)
print('Target vector:', y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)

print('Build and fit a regressor model...')

model = RandomForestRegressor()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

print('Done. Score', score)

Output:

Feature vector: (100, 100)
Target vector: (100, 2)
Build and fit a regressor model...
Done. Score 0.4405974071273537

This algorithm natively supports multi-target regression. For those ones which don't, you can use the multi-output regressor which simply fits one regressor per target.

Another alternative to the random forest approach would be to use an adapted version of Support Vector Regression, that fits multi-target regression problems. The advantage over fitting SVR with MultiOutputRegressor is that this method takes the underlying correlations between the multiple targets into account and hence should perform better. A working implementation with a paper reference can be found here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM