I'm sure this can be done by separately collecting all the data and then just using ggplot for the plotting, but I'd really prefer a simpler solution implementing ggplot, particulalry stat_ecdf() because of easier access to grouping variables, facets, etc.
My dataframe contains, amongst others, two columns of corresponding data x and y. I'd like to plot the ecdf of y on an axis of the corresponding x values. In other words, I'd like to plot what cumulative portion of the y variable is reached at its corresponding x value. While x and y are correlated (both descending), they are not analytically connected, so I cannot simply scale values of y to x. My attempts to do this with separate calculations of the ecdf functions of each subset have gotten extremely messy and complicated, while the stat_ecdf function seems to be very close to getting me what I need.
If I set the x variable in the ggplot aes to x and then set the variable within stat_ecdf to y, I am able to get the ecdf of y with axis labels of x; however, the actual values on the axis correspond to x. I'm plotting This is done with something like:
ggplot(df, aes(x, color=group_var)) + stat_ecdf(aes(y))
EDIT: To visualize this: This sample plot shows the ecdf of x for multiple groups. Each x value has a corresponding y value in a sorted dataframe ( approximate relationship, ignore the decreasing regions at the end . I would like to have a similar plot where the horizontal axis is in the corresponding y values. Basically, I need to map the horizontal axis of the first ecdf plot from x->y as simply as possible. I could do this manually by adding ecdf values as a column in the dataframe, but I am looking to do it within ggplot for simplicity, if possible.
Instead of trying to bend stat_ecdf
to do something it was not designed for, it's better to be explicit about your intention in the code.
It's quite straightforward. The most weird piece of code: ecdf(y)(y)
menas 'calculate the empirical CDF for y
, and then evaluate it for the actual values of y
in my data. The cummax
deals with the decreasing y
, to get ever increasing eCDF along x
.
d_sample %>%
group_by(group) %>%
arrange(group, x) %>%
mutate(
fraction = ecdf(y)(y),
maxf = pmax(fraction, cummax(fraction))) %>%
ggplot(aes(x, maxf)) +
geom_point() +
facet_wrap(~group)
I'm still not really sure if that's what you need.
To be honest it took me most of the time to 'fake' your dataset:
library(tidyverse)
tibble(x = seq_len(300) + 100) %>%
mutate(
one = - 1e-3 * (x * x) + 50 + 0.7 * x,
two = - 1e-3 * (x * x) + 55 + 0.68 * x,
three = - 1e-3 * (x * x) + 110 + 0.5 * x,
four = - 1e-3 * (x * x) + 10 + 0.8 * x) %>%
pivot_longer(-x, names_to = "group", values_to = "y") %>%
filter(
group == "one"
| group == "two"
| (group == "three" & x < 200)
| (group == "four" & x > 250)) ->
d_sample
d_sample %>%
ggplot(aes(x, y, colour = group)) +
geom_point()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.