簡體   English   中英

如何用numpy計算統計“t-test”

[英]How to calculate the statistics “t-test” with numpy

我想生成一些關於我在python中創建的模型的統計信息。 我想在它上面生成t檢驗,但是想知道是否有一種簡單的方法可以用numpy / scipy做到這一點。 周圍有什么好的解釋嗎?

例如,我有三個相關的數據集,如下所示:

[55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0]

現在,我想對他們進行學生的t檢驗。

scipy.stats包中有很少的ttest_...函數。 這里看示例:

>>> print 't-statistic = %6.3f pvalue = %6.4f' %  stats.ttest_1samp(x, m)
t-statistic =  0.391 pvalue = 0.6955

van使用scipy的答案是完全正確的,使用scipy.stats.ttest_*函數非常方便。

但我來到這個頁面尋找一個純粹的numpy解決方案,如標題中所述,以避免scipy依賴。 為此,我要指出這里給出的例子: https//docs.scipy.org/doc/numpy/reference/generated/numpy.random.standard_t.html

主要問題是,numpy沒有累積分布函數,因此我的結論是你應該真正使用scipy。 無論如何,只使用numpy是可能的:

從最初的問題我猜你想要比較你的數據集並用t檢驗判斷是否存在顯着的偏差? 此外,樣品配對? (參見https://en.wikipedia.org/wiki/Student%27s_t-test#Unpaired_and_paired_two-sample_t-tests )在這種情況下,您可以像這樣計算t值和p值:

import numpy as np
sample1 = np.array([55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0])
sample2 = np.array([54.0, 56.0, 48.0, 46.0, 56.0, 56.0, 55.0, 62.0])
# paired sample -> the difference has mean 0
difference = sample1 - sample2
# the t-value is easily computed with numpy
t = (np.mean(difference))/(difference.std(ddof=1)/np.sqrt(len(difference)))
# unfortunately, numpy does not have a build in CDF
# here is a ridiculous work-around integrating by sampling
s = np.random.standard_t(len(difference), size=100000)
p = np.sum(s<t) / float(len(s))
# using a two-sided test
print("There is a {} % probability that the paired samples stem from distributions with the same means.".format(2 * min(p, 1 - p) * 100))

這將打印There is a 73.028 % probability that the paired samples stem from distributions with the same means. 由於這遠遠高於任何理智的置信區間(比如說5%),所以你不應該為具體案例做任何結論。

一旦你得到你的t值,你可能想知道如何把它解釋為概率 - 我做到了。 這是我寫的一個函數來幫助它。

它基於我從http://www.vassarstats.net/rsig.htmlhttp://en.wikipedia.org/wiki/Student%27s_t_distribution收集的信息。

# Given (possibly random) variables, X and Y, and a correlation direction,
# returns:
#  (r, p),
# where r is the Pearson correlation coefficient, and p is the probability
# of getting the observed values if there is actually no correlation in the given
# direction.
#
# direction:
#  if positive, p is the probability of getting the observed result when there is no
#     positive correlation in the normally distributed full populations sampled by X
#     and Y
#  if negative, p is the probability of getting the observed result, when there is no
#     negative correlation
#  if 0, p is the probability of getting your result, if your hypothesis is true that
#    there is no correlation in either direction
def probabilityOfResult(X, Y, direction=0):
    x = len(X)
    if x != len(Y):
        raise ValueError("variables not same len: " + str(x) + ", and " + \
                         str(len(Y)))
    if x < 6:
        raise ValueError("must have at least 6 samples, but have " + str(x))
    (corr, prb_2_tail) = stats.pearsonr(X, Y)

    if not direction:
        return (corr, prb_2_tail)

    prb_1_tail = prb_2_tail / 2
    if corr * direction > 0:
        return (corr, prb_1_tail)

    return (corr, 1 - prb_1_tail)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM