[英]I am trying to calculate the p_value using the stats model
這是我獲取所需數字的代碼
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest
convert_old = len(df2[df2['group'] == 'control']['converted'] == 1)
convert_new = len(df2[df2['group'] == 'treatment']['converted'] == 1)
n_old = len(df2[df2['group'] == 'control'])
n_new = len(df2[df2['group'] == 'treatment'])
實際的 model 是:
stat, pval = proportions_ztest([convert_new ,convert_old], [n_new, n_old])
我得到了這個結果:
p值是: nan
我也收到警告:
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:670:
RuntimeWarning: invalid value encountered in double_scalars
zstat = value / std_diff
/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:672:
RuntimeWarning: invalid value encountered in absolute
pvalue = stats.norm.sf(np.abs(zstat))*2
我認為問題在於如何獲得convert_old
和convert_new
的數字。 通過設置['converted'] == 1
,您將根據每個單獨的值獲得一個具有 True/False 的系列,因此長度將不受影響,並且您將始終擁有相同的長度。 為了獲得正確的長度,您可以嘗試:
convert_old = len(df2[(df2['group'] == 'control') & (df2['converted'] == 1)]
convert_new = len(df2[(df2['group'] == 'treatment') & (df2['converted'] == 1)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.