簡體   English   中英

python中的stdtr在進行t檢驗時為nan提供p值

[英]stdtr in python giving nan for p-value while doing t-test

我正在使用以下代碼執行t檢驗:

def t_stat(na,abar,avar,nb,bbar,bvar):
     logger.info("T-test to be performed")
     logger.info("Set A count = %f mean = %f variance = %f" % (na,abar,avar))
     logger.info("Set B count = %f mean = %f variance = %f" % (nb,bbar,bvar))
     adof = na - 1
     bdof = nb - 1
     logger.info("Degrees of Freedom of a=%f" % adof)
     logger.info("Degrees of Freedom of b=%f" % bdof)
     tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
     dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
     logger.info("tf = %f, dof=%f"%(tf,dof))
     pf = 2*stdtr(dof, -np.abs(tf))

我的輸出看起來像:

     Set A count = 3547465.000000 mean = 0.001123 variance = 0.000369
     Set B count = 83759692.000000 mean = 0.001242 variance = 0.000424
     Degrees of Freedom of a=3547464.000000
     Degrees of Freedom of b=83759691.000000
     tf = -11.374250, dof=-2176568.362223
     formula:   t = -11.3743  p = nan

當我傳遞與數組相同的數據並使用ttest_ind函數時,我得到t = -11.374250 p = 0.000000。

為什么我的函數給p作為nan? Afaik,我不能將nan視為0。如何理解t_stat和ttest_ind之間的確切差異? 任何幫助,將不勝感激。

您傳遞給公式的自由度是負數。

In [6]:

import numpy as np
from scipy.special import stdtr
​
dof = -2176568
tf = -11.374250
2*stdtr(dof, -np.abs(tf))
Out[6]:
nan

如果為正:

In [7]:

import numpy as np
from scipy.special import stdtr
​
dof = 2176568
tf = -11.374250
2*stdtr(dof, -np.abs(tf))
Out[7]:
5.6293517178917971e-30

我想知道這在您的情況下是如何發生的,我運行了您的代碼以嘗試推斷輸入參數:

In [13]:

def t_stat(na,abar,avar,nb,bbar,bvar):
     print("T-test to be performed")
     print("Set A count = %f mean = %f variance = %f" % (na,abar,avar))
     print("Set B count = %f mean = %f variance = %f" % (nb,bbar,bvar))
     adof = na - 1
     bdof = nb - 1
     print("Degrees of Freedom of a=%f" % adof)
     print("Degrees of Freedom of b=%f" % bdof)
     tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
     dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
     print("tf = %f, dof=%f"%(tf,dof))
     print(stdtr(dof, -np.abs(tf)))
In [14]:

t_stat(3547465, 0.001123, 0.000369, 83759692, 0.001242, 0.000424)
T-test to be performed
Set A count = 3547465.000000 mean = 0.001123 variance = 0.000369
Set B count = 83759692.000000 mean = 0.001242 variance = 0.000424
Degrees of Freedom of a=3547464.000000
Degrees of Freedom of b=83759691.000000
tf = -11.393950, dof=3900753.641275
2.2434573594e-30

希望它可以幫助您找到問題。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM