简体   繁体   中英

Statsmodel Z-test not working as intended (statsmodels.stats.weightstats.CompareMeans.ztest_ind)

Everything is formatted like on the Statsmodels website, however somehow Spyder is returning this:

TypeError: ztest_ind() got multiple values for argument 'alternative'

My relevant input is this (data frame is working fine):

ztest = statsmodels.stats.weightstats.CompareMeans.ztest_ind(df1['TOTAL'], df2['TOTAL'], alternative = 'two-sided', usevar = 'unequal', value = 0)

I am following the formatting on this website: https://www.statsmodels.org/devel/generated/statsmodels.stats.weightstats.CompareMeans.ztest_ind.html

The api documentation is not very helpful to understand how to use this method. Below is the method syntax in the documentation (link provided at the end).

CompareMeans.ztest_ind(alternative='two-sided', usevar='pooled', value=0)
z-test for the null hypothesis of identical means

Parameters
x1array_like, 1-D or 2-D
first of the two independent samples, see notes for 2-D case

x2array_like, 1-D or 2-D
second of the two independent samples, see notes for 2-D case

At the first look, we don't see an option to pass the data values upon which we conduct the z-test. Though 2 parameters x1 and x2 are mentioned, there are no placeholders for these in the method definition anywhere. It took some digging around the source code to figure out how to use it.

So in the source code (link provided at the end), the method signature of ztest_ind() also outlines the parameters x1 and x2.

def ztest_ind(self, alternative="two-sided", usevar="pooled", value=0):
        """z-test for the null hypothesis of identical means

        Parameters
        ----------
        x1 : array_like, 1-D or 2-D
            first of the two independent samples, see notes for 2-D case
        x2 : array_like, 1-D or 2-D
            second of the two independent samples, see notes for 2-D case

The biggest hint here was the 'self' argument which made it clear that the ztest_ind() method has to be invoked from a class object which has 2 array like attributes ie our 2 columns of data upon which we wish to conduct the ztest.

If we take a look at the hierarchy upto ztest_ind(), we see that ztest_ind() needs to be invoked with an object reference of CompareMeans class

statsmodels.stats.weightstats.CompareMeans.ztest_ind

So we need to instantiate an object of CompareMeans class.

Now if we go to the CompareMeans() class signature, it is expecting 2 parameters which in turn are instances of DescrStatsW class!

class CompareMeans(object):
    """class for two sample comparison

    The tests and the confidence interval work for multi-endpoint comparison:
    If d1 and d2 have the same number of rows, then each column of the data
    in d1 is compared with the corresponding column in d2.

    Parameters
    ----------
    d1, d2 : instances of DescrStatsW

Taking a look at the DescrStatsW class definition, we see that it is expecting a 1 or 2d array like dataset.

Finally, putting this all together we get successful run of ztest on a sample dataset as shown below!

  import statsmodels.stats.weightstats as ws
    
    col1 = ws.DescrStatsW(df1['amount'])
    col2 = ws.DescrStatsW(df2['amount'])
    
    cm_obj = ws.CompareMeans(col1, col2)
    
    zstat, z_pval = cm_obj.ztest_ind(usevar='unequal')
    
    print(zstat.round(3), z_pval.round(3)) # --> 2.381 0.017

documentation

source code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM