ValueError: x and y must have same first dimension, but have shapes (4200,) and (16800, 1)

Question

Ihave created an SVR model using SCIKIT-LEARN, I am trying to plot my data but for some reason I am receiving the error:

ValueError: x and y must have same first dimension, but have shapes (4200,) and (16800, 1)

I have split my data into training and testing data, training the model and making a prediction. My code is:

X_feature = wind_speed

X_feature = X_feature.reshape(-1, 1)## Reshaping array to be 1D from 2D

y_label = Power
y_label = y_label.reshape(-1,1)

    timeseries_split = TimeSeriesSplit(n_splits=3) ## Splitting training testing data into 3 splits
    for train_index, test_index in timeseries_split.split(X_feature):## for loop to obtain print the training and splitting of the data 
    print("Training data:",train_index, "Testing data test:", test_index)#
    X_train, X_test = X_feature[train_index], X_feature[test_index]
    y_train, y_test = y_label[train_index], y_label [test_index]



    timeseries_split = TimeSeriesSplit(n_splits=3) ## Splitting training testing data into 3 splits






    scaler =pre.MinMaxScaler(feature_range=(0,1)).fit(X_train)## Data is being preprocessed then standard deviation 


    scaled_wind_speed_train = scaler.transform(X_train)## Wind speed training data is being scaled and then transformed 

    scaled_wind_speed_test = scaler.transform(X_test)## Wind speed test data is being scaled and then transformed

    SVR_model = svm.SVR(kernel='rbf',C=100,gamma=.001).fit(scaled_wind_speed_train,y_train)



    y_prediction = SVR_model.predict(scaled_wind_speed_test)

    SVR_model.score(scaled_wind_speed_test,y_test)


    rmse=numpy.sqrt(mean_squared_error(y_label,y_prediction))
    print("RMSE:",rmse)


    fig, bx = plt.subplots(figsize=(19,8))
    bx.plot(y_prediction, X_feature,'bs')
    fig.suptitle('Wind Power Prediction v Wind Speed', fontsize=20)
    plt.xlabel('Wind Power Data')
    plt.ylabel('Predicted Power')
    plt.xticks(rotation=30)
    plt.show() 


     fig, bx = plt.subplots(figsize=(19,8))
     bx.plot( y_prediction, y_label)
     fig.suptitle('Wind Power Prediction v Measured Wind Power ', fontsize=20)
     plt.xlabel('Wind Power Data')
     plt.ylabel('Predicted Power')


     fig, bx = plt.subplots(figsize=(19,8))
     bx.plot(y_prediction)
     fig.suptitle('Wind Power Prediction v Measured Wind Power ', fontsize=20)
     plt.xlabel('Wind Power Data')
     plt.ylabel('Predicted Power')

I believe this code is being genrated when I am trying to obtain the rmse in the line:

rmse=numpy.sqrt(mean_squared_error(y_label,y_prediction))

This error also occurs when I comment this line out and try to plot my data..

My traceback error message is:

ValueError                                Traceback (most recent call last)
<ipython-input-57-ed11a9ca7fd8> in <module>()
     79 
     80     fig, bx = plt.subplots(figsize=(19,8))
---> 81     bx.plot( y_prediction, y_label)
     82     fig.suptitle('Wind Power Prediction v Measured Wind Power ', fontsize=20)
     83     plt.xlabel('Wind Power Data')

~/anaconda3_501/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1715                     warnings.warn(msg % (label_namer, func.__name__),
   1716                                   RuntimeWarning, stacklevel=2)
-> 1717             return func(ax, *args, **kwargs)
   1718         pre_doc = inner.__doc__
   1719         if pre_doc is None:

~/anaconda3_501/lib/python3.6/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
   1370         kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
   1371 
-> 1372         for line in self._get_lines(*args, **kwargs):
   1373             self.add_line(line)
   1374             lines.append(line)

~/anaconda3_501/lib/python3.6/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
    402                 this += args[0],
    403                 args = args[1:]
--> 404             for seg in self._plot_args(this, kwargs):
    405                 yield seg
    406 

~/anaconda3_501/lib/python3.6/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
    382             x, y = index_of(tup[-1])
    383 
--> 384         x, y = self._xy_from_xy(x, y)
    385 
    386         if self.command == 'plot':

~/anaconda3_501/lib/python3.6/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
    241         if x.shape[0] != y.shape[0]:
    242             raise ValueError("x and y must have same first dimension, but "
--> 243                              "have shapes {} and {}".format(x.shape, y.shape))
    244         if x.ndim > 2 or y.ndim > 2:
    245             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (4200,) and (16800, 1)

Answer 1

I think you have mixed the arguements for mean_squared_error , it should be

rmse=numpy.sqrt(mean_squared_error(y_test,y_prediction))

Update : as per the latest error, try this

fig, bx = plt.subplots(figsize=(19,8))
bx.plot(y_prediction, scaled_wind_speed_test,'bs')
fig.suptitle('Wind Power Prediction v Wind Speed', fontsize=20)
plt.xlabel('Wind Power Data')
plt.ylabel('Predicted Power')
plt.xticks(rotation=30)
plt.show()

Update 2 In case you get error on the other plot try this

fig, bx = plt.subplots(figsize=(19,8))
bx.plot( y_prediction, y_test)
fig.suptitle('Wind Power Prediction v Measured Wind Power ', fontsize=20)
plt.xlabel('Wind Power Data')
plt.ylabel('Predicted Power')

Answer 2

Numpy's function mean_squared_error expects two arrays of the same size. The error you are getting implies that these two do not have the same size.

You can check your array sizes by

print(array_1.shape)
print(array_2.shape)

if the output you get is

output:
> (4200,)
> (4200, 1)

you can fix by doing

new_array_2 = array_2.transpose()[0]

and then

mean_squared_error(array_1, new_array_2)

if the two input arguments, whatever they are give you the following shapes

print(array_1.shape)
print(array_2.shape)

output:
> (4200,)
> (16800, 1)

try

new_array_1 = scalar.transform(array_1)

or

new_array_2 = scalar.transform(array_2)

until you get arrays with the same number whether it's 16800 or 4200. Once you have two of the same size, but the one or both still comes with an extra dimension,

then again do

new_new_array_1 = scalar.transform(new_array_1)[0]

and feed these to mean_squared_error , eg

mean_squared_error(new_new_array_1, new_array_2)

ValueError: x and y must have same first dimension, but have shapes (4200,) and (16800, 1)

Question

2 answers

solution1
2 ACCPTED 2018-06-26 08:09:42

solution2
0 2018-06-26 08:13:04

ValueError: x and y must have same first dimension, but have shapes (4200,) and (16800, 1)

Question

2 answers

solution1 2 ACCPTED 2018-06-26 08:09:42

solution2 0 2018-06-26 08:13:04

solution1
2 ACCPTED 2018-06-26 08:09:42

solution2
0 2018-06-26 08:13:04