简体   繁体   English

如何在 Python 中执行两样本单尾 t 检验

[英]How to perform two-sample, one-tailed t-test in Python

I want to perform a two-sample, one-tailed t-test to compare two means.我想执行一个双样本单尾 t 检验来比较两种均值。 For the specific problem I am looking, I want the comparison to only be in one direction.对于我正在寻找的具体问题,我希望仅在一个方向进行比较。 I would like the null hypothesis to be that mu_2 > mu_1 and the alternative hypothesis to be mu_1 <= mu_2 .我希望 null 假设是mu_2 > mu_1和替代假设是mu_1 <= mu_2 Or should the null hypothesis still be that mu_1 - mu_2 = 0 , even for the one-tailed case?还是应该 null 假设仍然是mu_1 - mu_2 = 0 ,即使对于单尾情况?

I am working with a large dataset, but if I were to extract and round the parameters, for data_1 it is mu_1 = 4.3, s_1 = 4.8, and n_1 = 40000 and data_2 it is mu_2 = 4.9, s_2 = 4.4, n_2 = 30000 .我正在处理一个大型数据集,但如果我要提取和舍入参数,对于 data_1 它是mu_1 = 4.3, s_1 = 4.8, and n_1 = 40000 ,而 data_2 它是mu_2 = 4.9, s_2 = 4.4, n_2 = 30000 . I am using scipy to perform a two-sample t-test:我正在使用 scipy 执行两个样本 t 检验:

stats.ttest_ind(data1,
                data2,
                equal_var = False)

Given that scipy only takes into account a two-tail test, I am not sure how to interpret the values.鉴于 scipy 仅考虑双尾测试,我不确定如何解释这些值。 Ttest_indResult(statistic=-19.51646312898464, pvalue=1.3452106729078845e-84) . Ttest_indResult(statistic=-19.51646312898464, pvalue=1.3452106729078845e-84) The alpha value is 0.05, and the p-value is much much smaller than that which would mean the null hypothesis is rejected. alpha 值为 0.05,p 值远小于意味着 null 假设被拒绝的值。 However, my intuition tells me that the null hypothesis should not be rejected, because mu_2 is clearly larger than mu_1 (at the very minimum I would expect the p-value to be larger).但是,我的直觉告诉我,不应拒绝 null 假设,因为 mu_2 明显大于 mu_1(至少我希望 p 值更大)。 Therefore, I feel like I'm either interpreting the results incorrectly or need to additional calculations to get the correct answer.因此,我觉得我要么错误地解释了结果,要么需要额外的计算才能得到正确的答案。

I would appreciate any additional help and guidance.我将不胜感激任何额外的帮助和指导。 Thanks!谢谢!

I provided another solution for t-test p-value calculation.我为 t 检验 p 值计算提供了另一种解决方案。

from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
    _, double_p = ttest_ind(x,y,equal_var = False)
    if alternative == 'both-sided':
        pval = double_p
    elif alternative == 'greater':
        if np.mean(x) > np.mean(y):
            pval = double_p/2.
        else:
            pval = 1.0 - double_p/2.
    elif alternative == 'less':
        if np.mean(x) < np.mean(y):
            pval = double_p/2.
        else:
            pval = 1.0 - double_p/2.
    return pval

You are correct, if you are doing a one sided test, it should have a large p-value.你是对的,如果你正在做一个单方面的测试,它应该有一个很大的 p 值。 ttest_ind performs a two sided test, which gives the probability that you observe something more extreme than the absolute of your t-statistic. ttest_ind执行一个双面测试,它给出了您观察到比您的 t 统计量的绝对值更极端的概率。

To do a one sided t test, you can use the cdf, which is the sum of probabilities up to your t statistic.要进行单边 t 检验,您可以使用 cdf,它是 t 统计量的概率之和。

Modifying this code slightly:稍微修改一下这段代码

def welch_ttest(x1, x2,alternative):
    n1 = x1.size
    n2 = x2.size
    m1 = np.mean(x1)
    m2 = np.mean(x2)
    v1 = np.var(x1, ddof=1)
    v2 = np.var(x2, ddof=1)
    tstat = (m1 - m2) / np.sqrt(v1 / n1 + v2 / n2)
    df = (v1 / n1 + v2 / n2)**2 / (v1**2 / (n1**2 * (n1 - 1)) + v2**2 / (n2**2 * (n2 - 1)))
    if alternative == "equal":
        p = 2 * t.cdf(-abs(tstat), df)
    if alternative == "lesser":
        p = t.cdf(tstat, df)
    if alternative == "greater":
        p = 1-t.cdf(tstat, df)
    return tstat, df, p

I simulate some data:我模拟了一些数据:

import numpy as np
from scipy.stats import ttest_ind
from scipy.stats import t

np.random.seed(seed=123)
data1 = np.random.normal(4.3,4.8,size=40000)
np.random.seed(seed=123)
data2 = np.random.normal(4.9,4.4,size=30000)
ndf = len(data1) +len(data2) - 2
ttest_ind(data1,data2,equal_var = False)

Ttest_indResult(statistic=-16.945279258324227, pvalue=2.8364816571790452e-64)

You get something like your result, we can test the code above for alternative == "equal" which is a two-sided test:您会得到类似结果的结果,我们可以测试上面的代码替代 == "equal",这是一个双面测试:

welch_ttest(data1,data2,"equal")

    (<scipy.stats._continuous_distns.t_gen at 0x12472b128>,
     67287.08544468222,
     2.8364816571790452e-64)

You can the same p-value as scipy 2 sided t-test, now we do the one sided test you need:您可以使用与 scipy 相同的 p 值进行 2 面 t 检验,现在我们进行您需要的单面检验:

welch_ttest(data1,data2,"greater")
(<scipy.stats._continuous_distns.t_gen at 0x12472b128>, 67287.08544468222, 1.0)

SciPy >= 1.6 SciPy >= 1.6

You can now do a two sample one tail test by using the "alternative" parameter per the documentation.您现在可以使用文档中的“替代”参数进行两个样本一尾测试。 In the below example I am using "less", but these are the options alternative{'two-sided', 'less', 'greater'}在下面的示例中,我使用的是“less”,但这些是选项 Alternative{'two-sided', 'less', 'greater'}

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

from scipy.stats import ttest_ind

ttest, pval = ttest_ind(data1, data2, alternative="less")

print("t-test", '{0:.10f}'.format(ttest[0]))
print("p-value", '{0:.10f}'.format(pval[0]))

if pval <0.05:
      print("we reject null hypothesis")
    else:
      print("we accept null hypothesis")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM