简体   繁体   English

R 中以下变量均值的均值、标准差和 95% 置信区间

[英]The mean, standard deviation and 95% confidence interval for the mean of the following variables in R

I need to create a summary table that shows the mean, standard deviation and 95% confidence interval for the mean of the following variables: Selling Price, Number of bedrooms, Size of house, Distance from city centre.我需要创建一个汇总表,显示以下变量的平均值、标准差和 95% 置信区间:售价、卧室数量、房屋大小、与市中心的距离。

I have a file with data.我有一个包含数据的文件。

ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
mydata <- read.csv("Real_Estate.csv")
head(mydata)
dfo <- data.frame(mydata)
dto <- data.table(dfo)
result_1 <- dto[, sapply(.SD, function(x) list(mean = mean(x)))]
result_2 <- dto[, sapply(.SD, function(x) list(sd = sd(x)))]

But I haven't idea how to calculate 95% CI and create summary table但我不知道如何计算 95% CI 并创建汇总表

Here's a reproducible tidyverse example that lets you create a summary table这是一个可重现的tidyverse示例,可让您创建汇总表

library(tidyverse)

df <- tibble(
  ID = 1:100,
  price = round(rnorm(100, mean = 500, sd = 50)),
  bedrooms = sample(1:4, 100, replace = T)
)

df %>%
  pivot_longer(cols = c(price, bedrooms),
               names_to = "variable",
               values_to = "value") %>%
  group_by(variable) %>%
  summarize(mean = mean(value),
            sd = sd(value),
            se = sd / sqrt(n()),
            CI_lower = mean - (1.96 * se),
            CI_upper = mean + (1.96 * se))

you can have two approaches;你可以有两种方法; You can use the below link to understand how you can do it by calculating SD, SE, giving degree of freedom etc. & at the end calculating the CI https://bookdown.org/logan_kelly/r_practice/p09.html您可以使用下面的链接来了解如何通过计算 SD、SE、给出自由度等来了解如何做到这一点。最后计算 CI https://bookdown.org/logan_kelly/r_practice/p09.html

Or you can use directly packages available to do it.或者您可以直接使用可用的软件包来执行此操作。 like Rmisc pacakge by the confidence interval mentioned.就像提到的置信区间的 Rmisc pacakge 一样。

install.packages("Rmisc")
library(Rmisc)
mydata<-iris
CI(mydata$Sepal.Length, ci=0.95)

At the end as a tip you can use psych package to have this kind of summary.最后作为提示,您可以使用 psych package 进行此类总结。

install.packages("psych")
library('psych')
describe(mydata)

It provides,它提供,

number of valid cases, mean, standard deviation, trimmed mean (with trim defaulting to.1), median, mad: median absolute deviation (from the median),minimum, maximum, skew, kurtosis, standard error有效案例数,平均值,标准差,修剪平均值(修剪默认为.1),中位数,疯狂:中位数绝对偏差(与中位数),最小值,最大值,偏斜,峰度,标准误差

A data.table solution is the following. data.table解决方案如下。

library(data.table)

ci <- function(x, conf = 0.95, na.rm = FALSE){
  xbar <- mean(x, na.rm = na.rm)
  s <- sd(x, na.rm = na.rm)
  p <- c((1 - conf)/2, 1 - (1 - conf)/2)
  qq <- qnorm(p, mean = xbar, sd = s)
  setNames(qq, c("lower", "upper"))
}
stats <- function(x, na.rm = FALSE){
  CI <- ci(x, na.rm = na.rm)
  c(
    Mean = mean(x, na.rm = na.rm),
    SD = sd(x, na.rm = na.rm),
    Lower = CI[1],
    Upper = CI[2]
  )
}


df1 <- as.data.table(df1)

df1[, lapply(.SD, stats), .SDcols = c("Price", "Size", "Distance")]
#       Price      Size  Distance
#1: 315.00000 139.83333 10.166667
#2:  23.45208  10.45785  4.024757
#3: 269.03477 119.33632  2.278288
#4: 360.96523 160.33035 18.055045

Data数据

df1 <- read.table(text = "
ID Price Bedrooms Size Pool Distance Suburbs Garage
1  1   300        2  124    0      8.6       1      0
2  2   340        2  142    0     10.3       1      0
3  3   280        2  145    0     17.5       4      1
4  4   340        2  139    0      7.9       1      0
5  5   310        2  155    0     10.9       4      1
6  6   320        2  134    0      5.8       3      1
", header = TRUE)

You can also use skimr but creating functions for the upper and lower CIs and then dropping any statistics you don't want by setting them to NULL.您也可以使用skimr,但为上下CIs创建函数,然后通过将它们设置为NULL来删除您不想要的任何统计信息。

library(skimr)
lower <- function(x ){Rmisc::CI(x)["lower"]}
upper <- function(x ){Rmisc::CI(x)["upper"]}
myskim <- skim_with(numeric = sfl(mean = mean, sd = sd, lower =  lower, 
                                  upper = upper), base = NULL,
                                  append =  FALSE)
myskim(mtcars)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算平均值的95%置信区间 - Calculate 95% confidence interval on the mean 使用 R 中的伽马分布估计均值和置信区间的标准差 - Estimating the standard deviation from mean and confidence intervals with a gamma distribution in R R:在95%置信区间内(2.5至97.5个百分位数),向量的最小值,最大值,均值和中值 - R: min, max, mean and median of a vector within 95% confidence interval (2.5 to 97.5 percentiles) r减去平均值并除以几个变量的标准差 - r subtract mean and divide by standard deviation on few variables ggplot2:具有均值/95% 置信区间线的密度图 - ggplot2: Density plot with mean / 95% confidence interval line 使用 Hmisc::xYplot 绘制平均值和 95% 置信区间并调整 x 轴 - Plotting mean and 95% confidence interval with Hmisc::xYplot and adjusting x axis 创建具有均值、标准差、标准误差和置信度误差的数据框 - Creating a data frame with mean, standard deviation, standard error and confidence error r 中的横杆,具有计算的置信区间和计算的平均值 - crossbar in r with computed confidence interval and computed mean R 中变量组的均值和置信区间 - Mean and Confidence interval for Groups of Variable in R 在 R 中使用手动设置的平均值计算标准偏差 - Compute standard deviation with a manually set mean in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM