Inputs working with some sklearn models but not other models in sklearn.linear and sklearn.ensemble

Question

_train_weather.values :  [[ 0.61818182  0.81645199  0.6679803  ...,  0.          0.          1.        ]
 [ 0.61664841  0.80064403  0.65073892 ...,  0.          0.          0.        ]
 [ 0.58291347  0.80679157  0.62783251 ...,  0.          0.          0.        ]
 ..., 
 [ 0.65914567  0.52019906  0.59975369 ...,  1.          0.          0.        ]
 [ 0.56232202  0.37558548  0.47980296 ...,  0.          1.          0.        ]
 [ 0.51829135  0.35626464  0.42832512 ...,  0.          0.          1.              ]]


_train_traffic['walkin_in'].values :  [[ 0.  0.  0. ...,  0.  0.  0.]
[ 0.  0.  0. ...,  0.  0.  0.]
[ 0.  0.  0. ...,  0.  0.  0.]
..., 
[ 0.  0.  0. ...,  0.  0.  0.]
[ 0.  0.  0. ...,  0.  0.  0.]
[ 0.  0.  0. ...,  0.  0.  0.]]


_test_weather.values :  [[ 0.3388828   0.50497658  0.341133   ...,  0.          0.          0.        ]
[ 0.27426068  0.4809719   0.30591133 ...,  0.          0.          0.        ]
[ 0.28368018  0.42681499  0.26600985 ...,  0.          0.          0.        ]
..., 
[ 0.732092    0.71516393  0.69482759 ...,  1.          0.          0.        ]
[ 0.74348302  0.70257611  0.6817734  ...,  0.          1.          0.        ]
[ 0.75465498  0.69642857  0.70862069 ...,  0.          0.          1.        ]]

I have arrays of values such as the above. I am training with _train_weather.values (X) and _train_traffic['walkin_in'].values (Y). I am predicting on _test_weather.values.

The data frames look like the above.

I can use these inputs to predict using certain models in sklearn such as MLP, RANSAC, Lasso, Ridge, LassoLars, RandomForestRegressor etc but there are some that do not work.

This is the list of those that do not work:

SGDRegressor AdaboostRegressor BaggingRegressor Lars GradientBoostingRegressor ARDRegression BayesianRidge HuberRegressor

Also ElasticNet works but not ElasticNetCV and this goes for Lasso as well where LassoCV does not work.

They provide the following error:

Traceback (most recent call last):
File "run_seq_predictor.py", line 519, in <module>
run(args.conf, train, test_model, test_MLP_reg, offset, verbose, weeks,   daily, write_to_isio, filter_abnormal, threshold)
File "run_seq_predictor.py", line 420, in run
clf.fit(_train_weather.values, _train_traffic['walkin_in'].values)
File "/usr/local/lib/python2.7/site-packages/sklearn/ensemble/bagging.py", line 248, in fit
return self._fit(X, y, self.max_samples, sample_weight=sample_weight)
File "/usr/local/lib/python2.7/site-packages/sklearn/ensemble/bagging.py", line 284, in _fit
X, y = check_X_y(X, y, ['csr', 'csc'])
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 526, in check_X_y
y = column_or_1d(y, warn=True)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 562, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (253, 56)

Could someone please explain why only certain models provide the above error whereas the other ones work completely fine?

Answer 1

You dependant variable is multivariate, not all models are capable of modelling these. If you read the docs for RANSAC, Lasso, Ridge, LassoLars, RandomForestRegressor etc. then you see under the fit function something like this

y : array-like, shape = [n_samples] or [n_samples, n_targets]

Where as for the others you listed, such as GradientBoostingRegressor

y : array-like, shape = [n_samples]

That's why you get the error. I'm happy to edit the answer if you provide more details about your dependant variable. Your data looks like it could be one-hot encoded...

Inputs working with some sklearn models but not other models in sklearn.linear and sklearn.ensemble

Question

1 answers

solution1
1 2017-05-02 14:53:17

Inputs working with some sklearn models but not other models in sklearn.linear and sklearn.ensemble

Question

1 answers

solution1 1 2017-05-02 14:53:17

solution1
1 2017-05-02 14:53:17