[英]Understanding Data Characteristics for Linear Regression in R - Plotting Data Distribution Over the Regression Line
我試圖了解在解決回歸問題時如何理解數據的某些屬性。 具體來說,我想看看數據(y)的分布特征為回歸量(x)的給定值處的正態分布,然后是 plot 這個正態分布(旋轉 90 度)與數據和回歸線。
這就是我正在嘗試解決的問題(此代碼工作正常):
library(BAS) # for data
x <- bodyfat$Abdomen
y <- bodyfat$Bodyfat
dat <- data.frame(cbind(x, y))
# Linear model
fat.mod <- lm(y ~ x, data = dat)
# Plot of linear model and data
g <- ggplot(bodyfat, aes(x = Abdomen, y = Bodyfat)) + geom_point() +
geom_smooth(method = "lm", se = FALSE)
g
我想看到的是這樣的圖像: ,但是對於我可以指定的 x 值(可能在 x 中有一些分布?)。 使用 plot,我想查看疊加分布的特征(均值和標准差或方差)。 假設數據圍繞回歸線呈正態分布是可以的。
我真正崩潰的地方是,如果我指定了一個數據中沒有明確顯示的點(例如,平均值)。
對此有什么想法嗎?
非常感謝!
這真的很不平凡。 這是解決方案(顯然取自 Kurz 博士的著作Doing Bayesian Data Analysis in brms and the tidyverse )
library(tidyverse)
# Draws per panel
n_draw <- 500
d <-
data.frame(panel = rep(letters[1:2],
each = n_draw),
x = c(runif(n = n_draw, min = -10, max = 10),
rnorm(n = n_draw / 2, mean = -7, sd = 2),
rnorm(n = n_draw / 2, mean = 3, sd = 2))) %>%
mutate(y = 10 + 2 * x + rnorm(n = n(),
mean = 0, sd = 2))
為旋轉的高斯創建一個單獨的 dataframe:
# Define the x values from which the normal curves come
curves <- data.frame(x = seq(from = -7.5, to = 7.5,
length.out = 4)) %>%
# Use a linear relation (10 + 2x here) to compute an expected y for x
mutate(y_mean = 10 + (2 * x)) %>%
# Based on a normal distribution with mean `y_mean` and a standard deviation of 2, compute the 95% intervals
mutate(ll = qnorm(0.025, mean = y_mean, sd = 2),
ul = qnorm(0.975, mean = y_mean, sd = 2)) %>%
# Use the interval to make a series of y values
mutate(y = map2(ll, ul, seq, length.out = 100)) %>%
# This must be `unnest()`ed
unnest(y) %>%
# Calculate density values
mutate(density = map2_dbl(y, y_mean, dnorm,
sd = 2)) %>%
# Rescale densities wider; redefine the x column
mutate(x = x - density * 2 / max(density))
然后 plot:
d %>% ggplot(aes(x = x, y = y)) -> g
g <- g + geom_point(size = 1/3, alpha = 1/3)
g <- g + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE,
color = "red", linetype = 2)
g <- g + geom_path(data = curves, aes(group = y_mean),
size = 1, color = "blue")
g <- g + coord_cartesian(xlim = c(-10, 10),
ylim = c(-10, 30))
g <- g + theme(strip.background = element_blank(),
strip.text = element_blank())
g
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.