I created the plot below using:
ggplot(data_all, aes(x = data_all$Speed, fill = data_all$Season)) +
theme_bw() +
geom_histogram(position = "identity", alpha = 0.2, binwidth=0.1)
As you can see, the difference in the amount of data available is very large. Is there a way to look only at the distribution and not at the total data amount?
There is probably a more elegant way to do this, but one approach would be to use the density()
function and render the results with geom_line()
.
library(tidyverse)
density1 <- iris %>%
filter(Species == "virginica") %>%
pull(Sepal.Length) %>%
density()
density2 <- iris %>%
filter(Species == "versicolor") %>%
pull(Sepal.Length) %>%
density()
data_all <- rbind(
data.frame(x = density1$x, y = density1$y, species = "virginica"),
data.frame(x = density2$x, y = density2$y, species = "versicolor")
)
ggplot(data_all) +
aes(x, y, color = species) +
geom_line() +
theme_minimal()
You can reference some of the other calculated values from stat functions using a notation that you may have seen before: ..value..
. I'm not sure the proper name for these or where you can find a list documented, but sometimes these are called "special variables" or "calculated aesthetics".
In this case, the default calculated aesthetic on the y axis for geom_histogram()
is ..count..
. When comparing distributions of different total N size, it's useful to use ..density..
. You can access ..density..
by passing it to the y
aesthetic directly in the geom_histogram()
function.
First, here's an example of two histograms with vastly different sizes (similar to OP's question):
library(ggplot2)
set.seed(8675309)
df <- data.frame(
x = c(rnorm(1000, -1, 0.5), rnorm(100000, 3, 1)),
group = c(rep("A", 1000), rep("B", 100000))
)
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
alpha=0.2, color='gray80',
position="identity", bins=80)
And here's the same plot using ..density..
:
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
aes(y=..density..), alpha=0.2, color='gray80',
position="identity", bins=80)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.