[英]The mean, standard deviation and 95% confidence interval for the mean of the following variables in R
我需要創建一個匯總表,顯示以下變量的平均值、標准差和 95% 置信區間:售價、卧室數量、房屋大小、與市中心的距離。
我有一個包含數據的文件。
ID Price Bedrooms Size Pool Distance Suburbs Garage
1 1 300 2 124 0 8.6 1 0
2 2 340 2 142 0 10.3 1 0
3 3 280 2 145 0 17.5 4 1
4 4 340 2 139 0 7.9 1 0
5 5 310 2 155 0 10.9 4 1
6 6 320 2 134 0 5.8 3 1
mydata <- read.csv("Real_Estate.csv")
head(mydata)
dfo <- data.frame(mydata)
dto <- data.table(dfo)
result_1 <- dto[, sapply(.SD, function(x) list(mean = mean(x)))]
result_2 <- dto[, sapply(.SD, function(x) list(sd = sd(x)))]
但我不知道如何計算 95% CI 並創建匯總表
這是一個可重現的tidyverse
示例,可讓您創建匯總表
library(tidyverse)
df <- tibble(
ID = 1:100,
price = round(rnorm(100, mean = 500, sd = 50)),
bedrooms = sample(1:4, 100, replace = T)
)
df %>%
pivot_longer(cols = c(price, bedrooms),
names_to = "variable",
values_to = "value") %>%
group_by(variable) %>%
summarize(mean = mean(value),
sd = sd(value),
se = sd / sqrt(n()),
CI_lower = mean - (1.96 * se),
CI_upper = mean + (1.96 * se))
你可以有兩種方法; 您可以使用下面的鏈接來了解如何通過計算 SD、SE、給出自由度等來了解如何做到這一點。最后計算 CI https://bookdown.org/logan_kelly/r_practice/p09.html
或者您可以直接使用可用的軟件包來執行此操作。 就像提到的置信區間的 Rmisc pacakge 一樣。
install.packages("Rmisc")
library(Rmisc)
mydata<-iris
CI(mydata$Sepal.Length, ci=0.95)
最后作為提示,您可以使用 psych package 進行此類總結。
install.packages("psych")
library('psych')
describe(mydata)
它提供,
有效案例數,平均值,標准差,修剪平均值(修剪默認為.1),中位數,瘋狂:中位數絕對偏差(與中位數),最小值,最大值,偏斜,峰度,標准誤差
data.table
解決方案如下。
library(data.table)
ci <- function(x, conf = 0.95, na.rm = FALSE){
xbar <- mean(x, na.rm = na.rm)
s <- sd(x, na.rm = na.rm)
p <- c((1 - conf)/2, 1 - (1 - conf)/2)
qq <- qnorm(p, mean = xbar, sd = s)
setNames(qq, c("lower", "upper"))
}
stats <- function(x, na.rm = FALSE){
CI <- ci(x, na.rm = na.rm)
c(
Mean = mean(x, na.rm = na.rm),
SD = sd(x, na.rm = na.rm),
Lower = CI[1],
Upper = CI[2]
)
}
df1 <- as.data.table(df1)
df1[, lapply(.SD, stats), .SDcols = c("Price", "Size", "Distance")]
# Price Size Distance
#1: 315.00000 139.83333 10.166667
#2: 23.45208 10.45785 4.024757
#3: 269.03477 119.33632 2.278288
#4: 360.96523 160.33035 18.055045
df1 <- read.table(text = "
ID Price Bedrooms Size Pool Distance Suburbs Garage
1 1 300 2 124 0 8.6 1 0
2 2 340 2 142 0 10.3 1 0
3 3 280 2 145 0 17.5 4 1
4 4 340 2 139 0 7.9 1 0
5 5 310 2 155 0 10.9 4 1
6 6 320 2 134 0 5.8 3 1
", header = TRUE)
您也可以使用skimr,但為上下CIs創建函數,然后通過將它們設置為NULL來刪除您不想要的任何統計信息。
library(skimr)
lower <- function(x ){Rmisc::CI(x)["lower"]}
upper <- function(x ){Rmisc::CI(x)["upper"]}
myskim <- skim_with(numeric = sfl(mean = mean, sd = sd, lower = lower,
upper = upper), base = NULL,
append = FALSE)
myskim(mtcars)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.