简体   繁体   中英

How do i fix reshaping my dataset for cross validation?

x_train:(153347,53)
x_test:(29039,52)
y:(153347,)

I am working with sklearn. To cross validate and reshape my dataset i did:

x_train, x_test, y_train, y_test = cross_validation.train_test_split(
x, y, test_size=0.3)

x_train = np.pad(x, [(0,0)], mode='constant')
x_test = np.pad(x, [(0,0)], mode='constant')
y = np.pad(y, [(0,0)], mode='constant')
x_train = np.arange(8127391).reshape((-1,1))
c = x.T
np.all(x_train == c)
x_test = np.arange(1510028).reshape((-1,1))
c2 = x.T
np.all(x_test == c2)
y = np.arange(153347).reshape((-1,1))
c3 = x.T
np.all(y == c3)

My error message is:ValueError: Found arrays with inconsistent numbers of samples: [ 2 153347]

I am not sure i need to pad my dataset in this case and the reshape is not working. Any ideas on how i can fix this?

With the little we see here one, I believe the call to cross_validation.train_test_split dumps because the the length of the two vectors does not coincide. So for every X (the data tuple we you observe) you need a Y (the data-point that is observed as a result).

At least this leads to the error shown above.

You should definitely improve on the formulation of the problem. Very much so.

regards, fricke

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM