The mean, standard deviation and 95% confidence interval for the mean of the following variables in R

I need to create a summary table that shows the mean, standard deviation and 95% confidence interval for the mean of the following variables: Selling Price, Number of bedrooms, Size of house, Distance from city centre.

I have a file with data.

ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
mydata <- read.csv("Real_Estate.csv")
dfo <- data.frame(mydata)
dto <- data.table(dfo)
result_1 <- dto[, sapply(.SD, function(x) list(mean = mean(x)))]
result_2 <- dto[, sapply(.SD, function(x) list(sd = sd(x)))]

But I haven't idea how to calculate 95% CI and create summary table

Here's a reproducible tidyverse example that lets you create a summary table


df <- tibble(
  ID = 1:100,
  price = round(rnorm(100, mean = 500, sd = 50)),
  bedrooms = sample(1:4, 100, replace = T)

df %>%
  pivot_longer(cols = c(price, bedrooms),
               names_to = "variable",
               values_to = "value") %>%
  group_by(variable) %>%
  summarize(mean = mean(value),
            sd = sd(value),
            se = sd / sqrt(n()),
            CI_lower = mean - (1.96 * se),
            CI_upper = mean + (1.96 * se))

you can have two approaches; You can use the below link to understand how you can do it by calculating SD, SE, giving degree of freedom etc. & at the end calculating the CI https://bookdown.org/logan_kelly/r_practice/p09.html

Or you can use directly packages available to do it. like Rmisc pacakge by the confidence interval mentioned.

CI(mydata$Sepal.Length, ci=0.95)

At the end as a tip you can use psych package to have this kind of summary.


It provides,

number of valid cases, mean, standard deviation, trimmed mean (with trim defaulting to.1), median, mad: median absolute deviation (from the median),minimum, maximum, skew, kurtosis, standard error

A data.table solution is the following.


ci <- function(x, conf = 0.95, na.rm = FALSE){
  xbar <- mean(x, na.rm = na.rm)
  s <- sd(x, na.rm = na.rm)
  p <- c((1 - conf)/2, 1 - (1 - conf)/2)
  qq <- qnorm(p, mean = xbar, sd = s)
  setNames(qq, c("lower", "upper"))
stats <- function(x, na.rm = FALSE){
  CI <- ci(x, na.rm = na.rm)
    Mean = mean(x, na.rm = na.rm),
    SD = sd(x, na.rm = na.rm),
    Lower = CI[1],
    Upper = CI[2]

df1 <- as.data.table(df1)

df1[, lapply(.SD, stats), .SDcols = c("Price", "Size", "Distance")]
#       Price      Size  Distance
#1: 315.00000 139.83333 10.166667
#2:  23.45208  10.45785  4.024757
#3: 269.03477 119.33632  2.278288
#4: 360.96523 160.33035 18.055045


df1 <- read.table(text = "
ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
", header = TRUE)

You can also use skimr but creating functions for the upper and lower CIs and then dropping any statistics you don't want by setting them to NULL.

lower <- function(x ){Rmisc::CI(x)["lower"]}
upper <- function(x ){Rmisc::CI(x)["upper"]}
myskim <- skim_with(numeric = sfl(mean = mean, sd = sd, lower =  lower, 
                                  upper = upper), base = NULL,
                                  append =  FALSE)

