简体   繁体   中英

Running a proportions_ztest after train_test_split

Well, I got a dataset and after splitting it into train and test data, through train_test_split I am trying to run a proportion_ztest against y_train and y_test :

(test_stat, p_value) = proportions_z_test(y_train, y_test, alternative='two-sided')

but Python keeps throwing ValueError: operands could not be broadast together with shapes (4254,) (1123,) .

My Y target variable is binary (classes 0 and 1)

Is there any way to go straight and add y_train and y_test to the proportion_ztest call, like in the code above, or, prior to it I will have to count all the 1 classes and the total counts of observartions in each dataset (y_train and y_test) and write the code with np.arrays like:

success = [123, 359]
TotalObs = [2500, 2500]
(test_stat, p_value) = proportions_z_test(success, TotalObs, alternative='two-sided')

I am trying to find a solution other than adding the stratify parameter to the train_test_split call.

Any help would be apreciated.

TY!

If the random variable is already coded as binary 0, 1, then the sum computes the number of successes.

So, the following should work for the two sample test of equal proportions

(test_stat, p_value) = proportions_z_test([y_train.sum(), y_test.sum()], 
                                          [len(y_train), len(y_test)], 
                                          alternative='two-sided')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM