簡體   English   中英

了解 R 中線性回歸的數據特征 - 繪制回歸線上的數據分布

[英]Understanding Data Characteristics for Linear Regression in R - Plotting Data Distribution Over the Regression Line

我試圖了解在解決回歸問題時如何理解數據的某些屬性。 具體來說,我想看看數據(y)的分布特征為回歸量(x)的給定值處的正態分布,然后是 plot 這個正態分布(旋轉 90 度)與數據和回歸線。

這就是我正在嘗試解決的問題(此代碼工作正常):

library(BAS)  # for data
x <- bodyfat$Abdomen
y <- bodyfat$Bodyfat
dat <- data.frame(cbind(x, y))

# Linear model
fat.mod <- lm(y ~ x, data = dat)

# Plot of linear model and data
g <- ggplot(bodyfat, aes(x = Abdomen, y = Bodyfat)) + geom_point() + 
  geom_smooth(method = "lm", se = FALSE)
g

我想看到的是這樣的圖像: 這個例子 ,但是對於我可以指定的 x 值(可能在 x 中有一些分布?)。 使用 plot,我想查看疊加分布的特征(均值和標准差或方差)。 假設數據圍繞回歸線呈正態分布是可以的。

我真正崩潰的地方是,如果我指定了一個數據中沒有明確顯示的點(例如,平均值)。

對此有什么想法嗎?

非常感謝!

這真的很不平凡。 這是解決方案(顯然取自 Kurz 博士的著作Doing Bayesian Data Analysis in brms and the tidyverse

library(tidyverse)

# Draws per panel
n_draw <- 500

d <-
  data.frame(panel = rep(letters[1:2], 
     each = n_draw),
     x = c(runif(n = n_draw, min = -10, max = 10), 
     rnorm(n = n_draw / 2, mean = -7, sd = 2), 
     rnorm(n = n_draw / 2,  mean = 3, sd = 2))) %>% 
          mutate(y = 10 + 2 * x + rnorm(n = n(), 
          mean = 0, sd = 2))

為旋轉的高斯創建一個單獨的 dataframe:

# Define the x values from which the normal curves come
curves <- data.frame(x = seq(from = -7.5, to = 7.5, 
     length.out = 4)) %>%
  
# Use a linear relation (10 + 2x here) to compute an expected y for x
     mutate(y_mean = 10 + (2 * x)) %>%
  
# Based on a normal distribution with mean `y_mean` and a standard deviation of 2, compute the 95% intervals
     mutate(ll = qnorm(0.025, mean = y_mean, sd = 2), 
          ul = qnorm(0.975, mean = y_mean, sd = 2)) %>%
  
# Use the interval to make a series of y values
     mutate(y = map2(ll, ul, seq, length.out = 100)) %>%
  
# This must be `unnest()`ed
     unnest(y) %>%
  
# Calculate density values
     mutate(density = map2_dbl(y, y_mean, dnorm, 
          sd = 2)) %>%
  
# Rescale densities wider; redefine the x column 
     mutate(x = x - density * 2 / max(density))

然后 plot:

d %>% ggplot(aes(x = x, y = y)) -> g
g <- g + geom_point(size = 1/3, alpha = 1/3)
g <- g + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE, 
        color = "red", linetype = 2)
g <- g + geom_path(data = curves, aes(group = y_mean),
            size = 1, color = "blue") 
g <- g + coord_cartesian(xlim = c(-10, 10),
        ylim = c(-10, 30)) 
g <- g + theme(strip.background = element_blank(),
        strip.text = element_blank()) 
g

生成的圖形是: 高斯表征線性擬合數據

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM