简体   繁体   中英

Compute standard deviation with a manually set mean in R

I know how to compute the sd using summarize:

ans <- temp%>% group_by(permno)%>%  summarise(std = sd(ret)))

But how do I compute the standard deviation given I know the mean = 0?

In other words, I know the true mean and want to use that instead of using the sample mean while computing the sd.

One way would be to manually code the sd function, but I need it to work for each group, so I'm stuck.

It is always best to provide reproducible data. Here is an example with the iris data set:

data(iris)
GM <- mean(iris$Sepal.Length)  # "Population mean"
ans <- iris %>% group_by(Species) %>% summarise(std=sum((Sepal.Length - GM)^2)/length(Sepal.Length))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.823
# 2 versicolor 0.270
# 3 virginica  0.951

As compared with computing the sd with each group mean:

ans <- iris %>% group_by(Species) %>% summarise(std=sd((Sepal.Length)))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.352
# 2 versicolor 0.516
# 3 virginica  0.636

Note that sd uses 'n - 1' in the denominator, but since you indicated that your mean was a population mean we use n .

I came up with this solution:

sd_fn <- function(x, mean_pop) {
  sd_f <- sqrt((sum((x-mean_pop)^2))/(length(x)))
  sd_f
}

x <- c(1,2,3,-1,-1.5,-2.8)
mean_pop <- 0

sd_fn(x, mean_pop)

I simply created a function where the arguments are a numeric vector and the population mean that you already know... Simply enter the vector with data and mean population and the function will givr you thr desired standard deviation.

Hi if want to calculate the sd from a true mean i think you could do it by using the mean function on the square difference of sample vector and the true mean to calculate variance, then use sqrt to calculate the standart deviation. Keep in mind, that base R ' s var and sd functions have automatic bessels correction, you can read at https://www.r-bloggers.com/2018/11/how-to-de-bias-standard-deviation-estimates/

#Sample Size
n=1000
#sample Random Vec
universe = rnorm(n,0,3)

# sample mean 
p = mean(universe)
p
# true mean
p0 = 0

# calculate "manually" using sample mean
variance <- mean((universe - p)^2)
variance

standard_deviation <- sqrt(variance)
standard_deviation

# calculate "manually" usingtrue mean

variance_true <- mean((universe - p0)^2)
variance_true

standard_deviation_true <- sqrt(variance_true)
standard_deviation_true
# calculate using built in R functions 
var_r<-var(universe)
var_r
r_sd<-sd(universe)
r_sd

# They have automatic Bessels correction :
variance * n/(n-1) == var_r # Bessels correction using  * n/(n-1) 

r_sd == sqrt(variance * n/(n-1) )



The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM