[英]Compute standard deviation with a manually set mean in R
I know how to compute the sd using summarize:我知道如何使用汇总计算 sd:
ans <- temp%>% group_by(permno)%>% summarise(std = sd(ret)))
But how do I compute the standard deviation given I know the mean = 0?但是,如果我知道均值 = 0,我该如何计算标准偏差?
In other words, I know the true mean and want to use that instead of using the sample mean while computing the sd.换句话说,我知道真正的平均值并希望在计算 sd 时使用它而不是使用样本平均值。
One way would be to manually code the sd function, but I need it to work for each group, so I'm stuck.一种方法是手动编码 sd 函数,但我需要它为每个组工作,所以我被卡住了。
It is always best to provide reproducible data.最好提供可重现的数据。 Here is an example with the
iris
data set:以下是
iris
数据集的示例:
data(iris)
GM <- mean(iris$Sepal.Length) # "Population mean"
ans <- iris %>% group_by(Species) %>% summarise(std=sum((Sepal.Length - GM)^2)/length(Sepal.Length))
ans
# A tibble: 3 × 2
# Species std
# <fct> <dbl>
# 1 setosa 0.823
# 2 versicolor 0.270
# 3 virginica 0.951
As compared with computing the sd with each group mean:与计算每个组的 sd 相比,平均值:
ans <- iris %>% group_by(Species) %>% summarise(std=sd((Sepal.Length)))
ans
# A tibble: 3 × 2
# Species std
# <fct> <dbl>
# 1 setosa 0.352
# 2 versicolor 0.516
# 3 virginica 0.636
Note that sd
uses 'n - 1' in the denominator, but since you indicated that your mean was a population mean we use n
.请注意,
sd
在分母中使用 'n - 1' ,但由于您表示您的均值是总体均值,我们使用n
。
I came up with this solution:我想出了这个解决方案:
sd_fn <- function(x, mean_pop) {
sd_f <- sqrt((sum((x-mean_pop)^2))/(length(x)))
sd_f
}
x <- c(1,2,3,-1,-1.5,-2.8)
mean_pop <- 0
sd_fn(x, mean_pop)
I simply created a function where the arguments are a numeric vector and the population mean that you already know... Simply enter the vector with data and mean population and the function will givr you thr desired standard deviation.我只是创建了一个函数,其中参数是一个数字向量,人口意味着您已经知道......只需输入带有数据和平均人口的向量,该函数就会为您提供所需的标准偏差。
Hi if want to calculate the sd from a true mean i think you could do it by using the mean function on the square difference of sample vector and the true mean to calculate variance, then use sqrt to calculate the standart deviation.嗨,如果想从真实平均值计算 sd,我认为你可以通过使用样本向量平方差的平均值函数和真实平均值来计算方差,然后使用 sqrt 来计算标准偏差。 Keep in mind, that base R ' s var and sd functions have automatic bessels correction, you can read at https://www.r-bloggers.com/2018/11/how-to-de-bias-standard-deviation-estimates/
请记住,基本 R 的 var 和 sd 函数具有自动 bessels 校正,您可以在https://www.r-bloggers.com/2018/11/how-to-de-bias-standard-deviation-阅读估计/
#Sample Size
n=1000
#sample Random Vec
universe = rnorm(n,0,3)
# sample mean
p = mean(universe)
p
# true mean
p0 = 0
# calculate "manually" using sample mean
variance <- mean((universe - p)^2)
variance
standard_deviation <- sqrt(variance)
standard_deviation
# calculate "manually" usingtrue mean
variance_true <- mean((universe - p0)^2)
variance_true
standard_deviation_true <- sqrt(variance_true)
standard_deviation_true
# calculate using built in R functions
var_r<-var(universe)
var_r
r_sd<-sd(universe)
r_sd
# They have automatic Bessels correction :
variance * n/(n-1) == var_r # Bessels correction using * n/(n-1)
r_sd == sqrt(variance * n/(n-1) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.