简体   繁体   English

在 R 中使用手动设置的平均值计算标准偏差

[英]Compute standard deviation with a manually set mean in R

I know how to compute the sd using summarize:我知道如何使用汇总计算 sd:

ans <- temp%>% group_by(permno)%>%  summarise(std = sd(ret)))

But how do I compute the standard deviation given I know the mean = 0?但是,如果我知道均值 = 0,我该如何计算标准偏差?

In other words, I know the true mean and want to use that instead of using the sample mean while computing the sd.换句话说,我知道真正的平均值并希望在计算 sd 时使用它而不是使用样本平均值。

One way would be to manually code the sd function, but I need it to work for each group, so I'm stuck.一种方法是手动编码 sd 函数,但我需要它为每个组工作,所以我被卡住了。

It is always best to provide reproducible data.最好提供可重现的数据。 Here is an example with the iris data set:以下是iris数据集的示例:

data(iris)
GM <- mean(iris$Sepal.Length)  # "Population mean"
ans <- iris %>% group_by(Species) %>% summarise(std=sum((Sepal.Length - GM)^2)/length(Sepal.Length))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.823
# 2 versicolor 0.270
# 3 virginica  0.951

As compared with computing the sd with each group mean:与计算每个组的 sd 相比,平均值:

ans <- iris %>% group_by(Species) %>% summarise(std=sd((Sepal.Length)))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.352
# 2 versicolor 0.516
# 3 virginica  0.636

Note that sd uses 'n - 1' in the denominator, but since you indicated that your mean was a population mean we use n .请注意, sd在分母中使用 'n - 1' ,但由于您表示您的均值是总体均值,我们使用n

I came up with this solution:我想出了这个解决方案:

sd_fn <- function(x, mean_pop) {
  sd_f <- sqrt((sum((x-mean_pop)^2))/(length(x)))
  sd_f
}

x <- c(1,2,3,-1,-1.5,-2.8)
mean_pop <- 0

sd_fn(x, mean_pop)

I simply created a function where the arguments are a numeric vector and the population mean that you already know... Simply enter the vector with data and mean population and the function will givr you thr desired standard deviation.我只是创建了一个函数,其中参数是一个数字向量,人口意味着您已经知道......只需输入带有数据和平均人口的向量,该函数就会为您提供所需的标准偏差。

Hi if want to calculate the sd from a true mean i think you could do it by using the mean function on the square difference of sample vector and the true mean to calculate variance, then use sqrt to calculate the standart deviation.嗨,如果想从真实平均值计算 sd,我认为你可以通过使用样本向量平方差的平均值函数和真实平均值来计算方差,然后使用 sqrt 来计算标准偏差。 Keep in mind, that base R ' s var and sd functions have automatic bessels correction, you can read at https://www.r-bloggers.com/2018/11/how-to-de-bias-standard-deviation-estimates/请记住,基本 R 的 var 和 sd 函数具有自动 bessels 校正,您可以在https://www.r-bloggers.com/2018/11/how-to-de-bias-standard-deviation-阅读估计/

#Sample Size
n=1000
#sample Random Vec
universe = rnorm(n,0,3)

# sample mean 
p = mean(universe)
p
# true mean
p0 = 0

# calculate "manually" using sample mean
variance <- mean((universe - p)^2)
variance

standard_deviation <- sqrt(variance)
standard_deviation

# calculate "manually" usingtrue mean

variance_true <- mean((universe - p0)^2)
variance_true

standard_deviation_true <- sqrt(variance_true)
standard_deviation_true
# calculate using built in R functions 
var_r<-var(universe)
var_r
r_sd<-sd(universe)
r_sd

# They have automatic Bessels correction :
variance * n/(n-1) == var_r # Bessels correction using  * n/(n-1) 

r_sd == sqrt(variance * n/(n-1) )



声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在R中按组同时计算均值,cv和标准差 - How to compute mean, cv and standard deviation simultaneously using by group in R 按扇区计算 R 中的标准偏差 - compute standard deviation in R by sector 在 R 代码中,我应该如何获得均值和标准差取决于“日”和“类别”列而不手动过滤 - How I should get mean and standard deviation depens on "Day" and "Category" columns without filtering manually in the R code 用R中的平均值0和标准偏差0.5规范化数据 - Normalizing data with a mean of 0 and a standard deviation of 0.5 in R 如何计算R中多个标准差的平均值 - How to calculate mean of multiple standard deviation in R 为 data.frame 中的多个变量按组计算均值和标准差 - Compute mean and standard deviation by group for multiple variables in a data.frame 滚动计算不同开始日期的滚动平均值/标准偏差 - Compute rolling mean/standard deviation with different start date with rollaply 有效地计算频率表的平均值和标准偏差 - Efficiently compute mean and standard deviation from a frequency table 在 R data.table 中,如何使用训练集的均值和标准差来标准化测试集 - In R data.table, how to standardize test set with mean and standard deviation of the training set R 中以下变量均值的均值、标准差和 95% 置信区间 - The mean, standard deviation and 95% confidence interval for the mean of the following variables in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM