[英]How to smooth the curve of a density plot in ggplot?
I'm trying to overlay density plots for an outcome variable that is expressed as an integer scale (1-7).我正在尝试为表示为 integer 比例 (1-7) 的结果变量叠加密度图。 Right now I'm using:现在我正在使用:
ggplot(dface, aes(Current.Mood, fill = NewCode))+ geom_density(alpha = 0.1)
That gets me:这让我:
For some reason I don't understand, ggplot is putting valleys in between the integer values (pictured below) Does anyone know how I can get the plot to smooth these over?出于某种原因,我不明白,ggplot 在 integer 值之间设置了谷值(如下图所示)有谁知道我怎样才能得到 plot 来平滑这些值?
Does anyone know how I can smooth these out?有谁知道我该如何解决这些问题? They are making the plot very hard to interpret and don't really reflect what's happening in my data.他们使 plot 很难解释,并没有真正反映我的数据中发生的事情。
geom_density(bw=..)
is useful here. geom_density(bw=..)
在这里很有用。
bw: The smoothing bandwidth to be used. If numeric, the standard
deviation of the smoothing kernel. If character, a rule to
choose the bandwidth, as listed in 'stats::bw.nrd()'.
ggplot(mtcars, aes(cyl)) + geom_density(bw = 0.1) + labs(title = "bw = 0.1")
ggplot(mtcars, aes(cyl)) + geom_density() + labs(title = "bw default")
ggplot(mtcars, aes(cyl)) + geom_density(bw = 2) + labs(title = "bw = 2")
Or, as MrFlick suggested, you can use adjust=
:或者,正如 MrFlick 所建议的,您可以使用adjust=
:
adjust: A multiplicate bandwidth adjustment. This makes it possible
to adjust the bandwidth while still using the a bandwidth
estimator. For example, 'adjust = 1/2' means use half of the
default bandwidth.
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.5) + labs(title = "adjust = 0.5")
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.9) + labs(title = "adjust = 0.9")
Your choice of data visualization is not ideal.您选择的数据可视化并不理想。 You want to compare the outcome variables across the 1-7 scale of different questions/groups.您想要比较不同问题/组的 1-7 等级的结果变量。 You probably want to map the frequency of the outcome variable to a geom_line
or geom_area
or both.您可能希望将结果变量的频率 map 设置为geom_line
或geom_area
或两者。
Using survey data from Kaggle .使用来自Kaggle的调查数据。
library(tidyverse)
my_data <- read_csv("~/Downloads/archive/test.csv")
plot_data <- my_data %>%
select(id, `Inflight wifi service`:`Food and drink`) %>%
pivot_longer(`Inflight wifi service`:`Food and drink`, names_to = "question", values_to = "response") %>%
count(question, response) %>%
group_by(question) %>%
mutate(freq = n / sum(n))
ggplot(plot_data) +
geom_area(aes(x = response, fill = question, y = freq), alpha = 0.5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.