简体   繁体   English

如何使用 numpy/scipy 执行两样本单尾 t 检验

[英]How to perform two-sample one-tailed t-test with numpy/scipy

In R , it is possible to perform two-sample one-tailed t-test simply by usingR ,可以简单地通过使用执行两样本单尾 t 检验

> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")

    Welch Two Sample t-test

data:  A and B 
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -1.029916       Inf 
sample estimates:
mean of x mean of y 
0.9954942 1.1798523 

In Python world, scipy provides similar function ttest_ind , but which can only do two-tailed t-tests.在 Python 世界中, scipy提供了类似的函数ttest_ind ,但它只能做双尾 t 检验。 Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy .我发现的关于该主题的最接近的信息是链接,但这似乎是对在scipy中实施单尾与双尾的政策的讨论。

Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy ?因此,我的问题是,有没有人知道有关如何使用numpy/scipy执行单尾版本测试的任何示例或说明?

From your mailing list link:从您的邮件列表链接:

because the one-sided tests can be backed out from the two-sided tests.因为单边测试可以从双边测试中退出。 (With symmetric distributions one-sided p-value is just half of the two-sided pvalue) (对于对称分布,一侧 p 值只是两侧 p 值的一半)

It goes on to say that scipy always gives the test statistic as signed.它继续说 scipy 总是给出带符号的测试统计量。 This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0 , and of a less-than test when p/2 < alpha and t < 0 .这意味着从一个双尾检验给定p和t值,你会拒绝零假设一个大于测试时p/2 < alpha and t > 0 ,和一个小于测试时p/2 < alpha and t < 0

After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.在尝试添加一些见解作为对已接受答案的评论但由于评论的一般限制而无法正确写下它们之后,我决定将我的两分钱作为完整答案。

First let's formulate our investigative question properly.首先让我们正确地表述我们的调查问题。 The data we are investigating is我们正在调查的数据是

A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])

with the sample means与样本均值

A.mean() = 0.99549419
B.mean() = 1.1798523

I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.我假设由于 B 的均值明显大于 A 的均值,因此您想检查此结果是否具有统计显着性。

So we have the Null Hypothesis所以我们有零假设

H0: A >= B

that we would like to reject in favor of the Alternative Hypothesis我们想拒绝支持替代假设

H1: B > A

Now when you call scipy.stats.ttest_ind(x, y) , this makes a Hypothesis Test on the value of x.mean()-y.mean() , which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call现在,当您调用scipy.stats.ttest_ind(x, y) ,这x.mean()-y.mean()的值进行假设检验,这意味着为了在整个计算过程中获得正值(即简化所有考虑)我们必须调用

stats.ttest_ind(B,A)

instead of stats.ttest_ind(B,A) .而不是stats.ttest_ind(B,A) We get as an answer我们得到了答案

  • t-value = 0.42210654140239207
  • p-value = 0.68406235191764142

and since according to the documentation this is the output for a two-tailed t-test we must divide the p by 2 for our one-tailed test.并且由于根据文档这是双尾 t 检验的输出,因此我们必须将p除以 2 以进行单尾检验。 So depending on the Significance Level alpha you have chosen you need因此,根据您选择的显着性水平alpha ,您需要

p/2 < alpha

in order to reject the Null Hypothesis H0 .为了拒绝原假设H0 For alpha=0.05 this is clearly not the case so you cannot reject H0 .对于alpha=0.05这显然不是这种情况,因此您不能拒绝H0

An alternative way to decide if you reject H0 without having to do any algebra on t or p is by looking at the t-value and comparing it with the critical t-value t_crit at the desired level of confidence (eg 95%) for the number of degrees of freedom df that applies to your problem.无需对tp进行任何代数即可决定是否拒绝H0的另一种方法是查看 t 值并将其与临界 t 值t_crit在所需的置信水平(例如 95%)下进行比较。适用于您的问题的自由度df数量。 Since we have既然我们有

df = sample_size_1 + sample_size_2 - 2 = 8

we get from a statistical table like this one that我们从统计表格得到像这一个

t_crit(df=8, confidence_level=95%) = 1.860

We clearly have我们显然有

t < t_crit

so we obtain again the same result, namely that we cannot reject H0 .所以我们再次得到相同的结果,即我们不能拒绝H0

When null hypothesis is Ho: P1>=P2 and alternative hypothesis is Ha: P1<P2 .当原假设为Ho: P1>=P2且备择假设为Ha: P1<P2 In order to test it in Python, you write ttest_ind(P2,P1) .为了在 Python 中测试它,您可以编写ttest_ind(P2,P1) (Notice the position is P2 first). (注意位置是P2第一)。

first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)

You will get the result like below Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)您将得到如下结果Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)

In Python, when statstic <0 your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2 , which is approximately 0.99.在 Python 中,当statstic <0您的真实 p 值实际上是real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2 ,大约为 0.99。 As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3.由于您的 p 值大于 0.05,您不能拒绝 6>=3 的原假设。 when statstic >0 , the real z score is actually equal to -statstic , the real p-value is equal to pvalue/2.statstic >0 ,实际 z 分数实际上等于-statstic ,实际 p 值等于 pvalue/2。

Ivc's answer should be when (1-p/2) < alpha and t < 0 , you can reject the less than hypothesis. Ivc 的答案应该是当(1-p/2) < alpha and t < 0 ,您可以拒绝小于假设。

    from scipy.stats import ttest_ind  
    
    def t_test(x,y,alternative='both-sided'):
            _, double_p = ttest_ind(x,y,equal_var = False)
            if alternative == 'both-sided':
                pval = double_p
            elif alternative == 'greater':
                if np.mean(x) > np.mean(y):
                    pval = double_p/2.
                else:
                    pval = 1.0 - double_p/2.
            elif alternative == 'less':
                if np.mean(x) < np.mean(y):
                    pval = double_p/2.
                else:
                    pval = 1.0 - double_p/2.
            return pval

    A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
    B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]

    print(t_test(A,B,alternative='greater'))
    0.6555098817758839

Based on this function from R: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test基于 R 的这个函数: https : //www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test

def ttest(a, b, axis=0, equal_var=True, nan_policy='propagate',
          alternative='two.sided'):        
    tval, pval = ttest_ind(a=a, b=b, axis=axis, equal_var=equal_var,
                           nan_policy=nan_policy)
    if alternative == 'greater':
        if tval < 0:
            pval = 1 - pval / 2
        else:
            pval = pval / 2
    elif alternative == 'less':
        if tval < 0:
            pval /= 2
        else:
            pval = 1 - pval / 2
    else:
        assert alternative == 'two.sided'
    return tval, pval

Did you look at this: How to calculate the statistics "t-test" with numpy你看过这个: 如何用numpy计算统计数据“t-test”

I think that is exactly what this questions is looking at.我认为这正是这个问题所关注的。

Basically:基本上:

import scipy.stats
x = [1,2,3,4]
scipy.stats.ttest_1samp(x, 0)

Ttest_1sampResult(statistic=3.872983346207417, pvalue=0.030466291662170977)

is the same result as this example in R. https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero与 R 中的此示例结果相同。 https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM