简体   繁体   中英

The mean, standard deviation and 95% confidence interval for the mean of the following variables in R

I need to create a summary table that shows the mean, standard deviation and 95% confidence interval for the mean of the following variables: Selling Price, Number of bedrooms, Size of house, Distance from city centre.

I have a file with data.

ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
mydata <- read.csv("Real_Estate.csv")
head(mydata)
dfo <- data.frame(mydata)
dto <- data.table(dfo)
result_1 <- dto[, sapply(.SD, function(x) list(mean = mean(x)))]
result_2 <- dto[, sapply(.SD, function(x) list(sd = sd(x)))]

But I haven't idea how to calculate 95% CI and create summary table

Here's a reproducible tidyverse example that lets you create a summary table

library(tidyverse)

df <- tibble(
  ID = 1:100,
  price = round(rnorm(100, mean = 500, sd = 50)),
  bedrooms = sample(1:4, 100, replace = T)
)

df %>%
  pivot_longer(cols = c(price, bedrooms),
               names_to = "variable",
               values_to = "value") %>%
  group_by(variable) %>%
  summarize(mean = mean(value),
            sd = sd(value),
            se = sd / sqrt(n()),
            CI_lower = mean - (1.96 * se),
            CI_upper = mean + (1.96 * se))

you can have two approaches; You can use the below link to understand how you can do it by calculating SD, SE, giving degree of freedom etc. & at the end calculating the CI https://bookdown.org/logan_kelly/r_practice/p09.html

Or you can use directly packages available to do it. like Rmisc pacakge by the confidence interval mentioned.

install.packages("Rmisc")
library(Rmisc)
mydata<-iris
CI(mydata$Sepal.Length, ci=0.95)

At the end as a tip you can use psych package to have this kind of summary.

install.packages("psych")
library('psych')
describe(mydata)

It provides,

number of valid cases, mean, standard deviation, trimmed mean (with trim defaulting to.1), median, mad: median absolute deviation (from the median),minimum, maximum, skew, kurtosis, standard error

A data.table solution is the following.

library(data.table)

ci <- function(x, conf = 0.95, na.rm = FALSE){
  xbar <- mean(x, na.rm = na.rm)
  s <- sd(x, na.rm = na.rm)
  p <- c((1 - conf)/2, 1 - (1 - conf)/2)
  qq <- qnorm(p, mean = xbar, sd = s)
  setNames(qq, c("lower", "upper"))
}
stats <- function(x, na.rm = FALSE){
  CI <- ci(x, na.rm = na.rm)
  c(
    Mean = mean(x, na.rm = na.rm),
    SD = sd(x, na.rm = na.rm),
    Lower = CI[1],
    Upper = CI[2]
  )
}


df1 <- as.data.table(df1)

df1[, lapply(.SD, stats), .SDcols = c("Price", "Size", "Distance")]
#       Price      Size  Distance
#1: 315.00000 139.83333 10.166667
#2:  23.45208  10.45785  4.024757
#3: 269.03477 119.33632  2.278288
#4: 360.96523 160.33035 18.055045

Data

df1 <- read.table(text = "
ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
", header = TRUE)

You can also use skimr but creating functions for the upper and lower CIs and then dropping any statistics you don't want by setting them to NULL.

library(skimr)
lower <- function(x ){Rmisc::CI(x)["lower"]}
upper <- function(x ){Rmisc::CI(x)["upper"]}
myskim <- skim_with(numeric = sfl(mean = mean, sd = sd, lower =  lower, 
                                  upper = upper), base = NULL,
                                  append =  FALSE)
myskim(mtcars)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM