How do i fix reshaping my dataset for cross validation?

Question

x_train:(153347,53)
x_test:(29039,52)
y:(153347,)

I am working with sklearn. To cross validate and reshape my dataset i did:

x_train, x_test, y_train, y_test = cross_validation.train_test_split(
x, y, test_size=0.3)

x_train = np.pad(x, [(0,0)], mode='constant')
x_test = np.pad(x, [(0,0)], mode='constant')
y = np.pad(y, [(0,0)], mode='constant')
x_train = np.arange(8127391).reshape((-1,1))
c = x.T
np.all(x_train == c)
x_test = np.arange(1510028).reshape((-1,1))
c2 = x.T
np.all(x_test == c2)
y = np.arange(153347).reshape((-1,1))
c3 = x.T
np.all(y == c3)

My error message is:ValueError: Found arrays with inconsistent numbers of samples: [ 2 153347]

I am not sure i need to pad my dataset in this case and the reshape is not working. Any ideas on how i can fix this?

Answer 1

With the little we see here one, I believe the call to cross_validation.train_test_split dumps because the the length of the two vectors does not coincide. So for every X (the data tuple we you observe) you need a Y (the data-point that is observed as a result).

At least this leads to the error shown above.

You should definitely improve on the formulation of the problem. Very much so.

regards, fricke

How do i fix reshaping my dataset for cross validation?

Question

1 answers

solution1
1 ACCPTED 2016-10-01 09:45:22

How do i fix reshaping my dataset for cross validation?

Question

1 answers

solution1 1 ACCPTED 2016-10-01 09:45:22

solution1
1 ACCPTED 2016-10-01 09:45:22