Running a proportions_ztest after train_test_split

Question

Well, I got a dataset and after splitting it into train and test data, through train_test_split I am trying to run a proportion_ztest against y_train and y_test :

(test_stat, p_value) = proportions_z_test(y_train, y_test, alternative='two-sided')

but Python keeps throwing ValueError: operands could not be broadast together with shapes (4254,) (1123,) .

My Y target variable is binary (classes 0 and 1)

Is there any way to go straight and add y_train and y_test to the proportion_ztest call, like in the code above, or, prior to it I will have to count all the 1 classes and the total counts of observartions in each dataset (y_train and y_test) and write the code with np.arrays like:

success = [123, 359]
TotalObs = [2500, 2500]
(test_stat, p_value) = proportions_z_test(success, TotalObs, alternative='two-sided')

I am trying to find a solution other than adding the stratify parameter to the train_test_split call.

Any help would be apreciated.

TY!

Answer 1

If the random variable is already coded as binary 0, 1, then the sum computes the number of successes.

So, the following should work for the two sample test of equal proportions

(test_stat, p_value) = proportions_z_test([y_train.sum(), y_test.sum()], 
                                          [len(y_train), len(y_test)], 
                                          alternative='two-sided')

Running a proportions_ztest after train_test_split

Question

1 answers

solution1
1 ACCPTED 2020-11-06 16:01:48

Running a proportions_ztest after train_test_split

Question

1 answers

solution1 1 ACCPTED 2020-11-06 16:01:48

solution1
1 ACCPTED 2020-11-06 16:01:48