如何平滑ggplot中密度为plot的曲线？

Question

I'm trying to overlay density plots for an outcome variable that is expressed as an integer scale (1-7).我正在尝试为表示为 integer 比例 (1-7) 的结果变量叠加密度图。 Right now I'm using:现在我正在使用：

ggplot(dface, aes(Current.Mood, fill = NewCode))+ geom_density(alpha = 0.1)

That gets me:这让我：

For some reason I don't understand, ggplot is putting valleys in between the integer values (pictured below) Does anyone know how I can get the plot to smooth these over?出于某种原因，我不明白，ggplot 在 integer 值之间设置了谷值（如下图所示）有谁知道我怎样才能得到 plot 来平滑这些值？

Does anyone know how I can smooth these out?有谁知道我该如何解决这些问题？ They are making the plot very hard to interpret and don't really reflect what's happening in my data.他们使 plot 很难解释，并没有真正反映我的数据中发生的事情。

Answer 1

geom_density(bw=..) is useful here. geom_density(bw=..)在这里很有用。

      bw: The smoothing bandwidth to be used. If numeric, the standard
          deviation of the smoothing kernel. If character, a rule to
          choose the bandwidth, as listed in 'stats::bw.nrd()'.

ggplot(mtcars, aes(cyl)) + geom_density(bw = 0.1) + labs(title = "bw = 0.1")
ggplot(mtcars, aes(cyl)) + geom_density() + labs(title = "bw default")
ggplot(mtcars, aes(cyl)) + geom_density(bw = 2) + labs(title = "bw = 2")

Or, as MrFlick suggested, you can use adjust= :或者，正如 MrFlick 所建议的，您可以使用adjust= ：

  adjust: A multiplicate bandwidth adjustment. This makes it possible
          to adjust the bandwidth while still using the a bandwidth
          estimator. For example, 'adjust = 1/2' means use half of the
          default bandwidth.

ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.5) + labs(title = "adjust = 0.5")
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.9) + labs(title = "adjust = 0.9")

Answer 2

Your choice of data visualization is not ideal.您选择的数据可视化并不理想。 You want to compare the outcome variables across the 1-7 scale of different questions/groups.您想要比较不同问题/组的 1-7 等级的结果变量。 You probably want to map the frequency of the outcome variable to a geom_line or geom_area or both.您可能希望将结果变量的频率 map 设置为geom_line或geom_area或两者。

Using survey data from Kaggle .使用来自Kaggle的调查数据。

library(tidyverse)

my_data <- read_csv("~/Downloads/archive/test.csv")

plot_data <- my_data %>%
  select(id, `Inflight wifi service`:`Food and drink`) %>%
  pivot_longer(`Inflight wifi service`:`Food and drink`, names_to = "question", values_to = "response") %>%
  count(question, response) %>%
  group_by(question) %>%
  mutate(freq = n / sum(n))

ggplot(plot_data) +
  geom_area(aes(x = response, fill = question, y = freq), alpha = 0.5)

如何平滑ggplot中密度为plot的曲线？

问题描述

2 个解决方案

解决方案1
2 2021-08-13 17:37:36

解决方案2
0 2021-08-13 17:42:30

如何平滑ggplot中密度为plot的曲线？

问题描述

2 个解决方案

解决方案1 2 2021-08-13 17:37:36

解决方案2 0 2021-08-13 17:42:30

解决方案1
2 2021-08-13 17:37:36

解决方案2
0 2021-08-13 17:42:30