简体   繁体   English

Python中样本均值的置信区间(不同于手册)

[英]Confidence Interval for Sample Mean in Python (Different from Manual)

I'm trying to create some material for introductory statistics for a seminar.我正在尝试为研讨会的介绍性统计创建一些材料。 The above code computes a 95% confidence interval for estimating the mean, but the result is not the same from the one implemented in Python.上面的代码计算了用于估计均值的 95% 置信区间,但结果与 Python 中实现的结果不同。 Is there something wrong with my math / code?我的数学/代码有问题吗? Thanks.谢谢。

EDIT:编辑:

Data was sampled from here数据是从这里采样的

import pandas as pd
import numpy as np
x = np.random.normal(60000,15000,200)
income = pd.DataFrame()
income = pd.DataFrame()
income['Data Scientist'] = x

# Manual Implementation
sample_mean = income['Data Scientist'].mean()
sample_std = income['Data Scientist'].std()
standard_error = sample_std / (np.sqrt(income.shape[0]))
print('Mean',sample_mean)
print('Std',sample_std)
print('Standard Error',standard_error)
print('(',sample_mean-2*standard_error,',',sample_mean+2*standard_error,')')


# Python Library
import scipy.stats as st
se = st.sem(income['Data Scientist'])
a = st.t.interval(0.95, len(income['Data Scientist'])-1, loc=sample_mean, scale=se)
print(a)
print('Standard Error from this code block',se)

You've got 2 errors.你有2个错误。

First, you are using 2 for the multiplier for the CI.首先,您使用 2 作为 CI 的乘数。 The more accurate value is 1.96.更准确的值是 1.96。 "2" is just a convenient estimator. “2”只是一个方便的估计量。 That is making your CI generated manually too fat.这会使您手动生成的 CI 太胖。

Second, you are comparing a normal distribution to the t-distribution.其次,您将正态分布与 t 分布进行比较。 This probably isn't causing more than decimal dust in difference because you have 199 degrees of freedom for the t-dist, which is basically the normal.这可能不会导致超过十进制灰尘的差异,因为 t-dist 有 199 自由度,这基本上是正常的。

Below is the z-score of 1.96 and computation of CI with apples-to-apples comparison to the norm distribution vs. t.下面是 1.96 的 z-score 和使用苹果对苹果的 CI 计算与范数分布与 t 的比较。

In [45]: st.norm.cdf(1.96)                                                                   
Out[45]: 0.9750021048517795

In [46]: print('(',sample_mean-1.96*standard_error,',',sample_mean+1.96*standard_error,')')  
( 57558.007862202685 , 61510.37559873406 )

In [47]: st.norm.interval(0.95, loc=sample_mean, scale=se)                                   
Out[47]: (57558.044175045005, 61510.33928589174)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM