简体   繁体   English

如何平滑ggplot中密度为plot的曲线?

[英]How to smooth the curve of a density plot in ggplot?

I'm trying to overlay density plots for an outcome variable that is expressed as an integer scale (1-7).我正在尝试为表示为 integer 比例 (1-7) 的结果变量叠加密度图。 Right now I'm using:现在我正在使用:

ggplot(dface, aes(Current.Mood, fill = NewCode))+ geom_density(alpha = 0.1)

That gets me:这让我:

在此处输入图像描述

For some reason I don't understand, ggplot is putting valleys in between the integer values (pictured below) Does anyone know how I can get the plot to smooth these over?出于某种原因,我不明白,ggplot 在 integer 值之间设置了谷值(如下图所示)有谁知道我怎样才能得到 plot 来平滑这些值?

Does anyone know how I can smooth these out?有谁知道我该如何解决这些问题? They are making the plot very hard to interpret and don't really reflect what's happening in my data.他们使 plot 很难解释,并没有真正反映我的数据中发生的事情。

geom_density(bw=..) is useful here. geom_density(bw=..)在这里很有用。

      bw: The smoothing bandwidth to be used. If numeric, the standard
          deviation of the smoothing kernel. If character, a rule to
          choose the bandwidth, as listed in 'stats::bw.nrd()'.
ggplot(mtcars, aes(cyl)) + geom_density(bw = 0.1) + labs(title = "bw = 0.1")
ggplot(mtcars, aes(cyl)) + geom_density() + labs(title = "bw default")
ggplot(mtcars, aes(cyl)) + geom_density(bw = 2) + labs(title = "bw = 2")

带宽为 0.1

默认带宽

带宽为 2

Or, as MrFlick suggested, you can use adjust= :或者,正如 MrFlick 所建议的,您可以使用adjust=

  adjust: A multiplicate bandwidth adjustment. This makes it possible
          to adjust the bandwidth while still using the a bandwidth
          estimator. For example, 'adjust = 1/2' means use half of the
          default bandwidth.
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.5) + labs(title = "adjust = 0.5")
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.9) + labs(title = "adjust = 0.9")

调整 0.5

调整 0.9

Your choice of data visualization is not ideal.您选择的数据可视化并不理想。 You want to compare the outcome variables across the 1-7 scale of different questions/groups.您想要比较不同问题/组的 1-7 等级的结果变量。 You probably want to map the frequency of the outcome variable to a geom_line or geom_area or both.您可能希望将结果变量的频率 map 设置为geom_linegeom_area或两者。

Using survey data from Kaggle .使用来自Kaggle的调查数据。

library(tidyverse)

my_data <- read_csv("~/Downloads/archive/test.csv")

plot_data <- my_data %>%
  select(id, `Inflight wifi service`:`Food and drink`) %>%
  pivot_longer(`Inflight wifi service`:`Food and drink`, names_to = "question", values_to = "response") %>%
  count(question, response) %>%
  group_by(question) %>%
  mutate(freq = n / sum(n))

ggplot(plot_data) +
  geom_area(aes(x = response, fill = question, y = freq), alpha = 0.5)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM