简体   繁体   中英

Reproducing a ggplot2 geom_linerange() example

I am trying to produce a plot that will end up looking like this:

geom_linerange()示例

However, I want the endpoints of each line to represent the 25th percentile (at the bottom) and 75th percentile (at the top) of each group of numbers. The dot in the middle should be the median. I can make box plots from these data with geom_boxplot() , but I think this would look a lot nicer. Anyway, I can't make this work. Right now I am getting this error message:

Warning message:
In data.frame(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  :
  row names were found from a short variable and have been discarded

My data look like this:

> str(outbtu)
'data.frame':   86400 obs. of  2 variables:
 $ bias: num  -0.248 -0.759 -0.471 -0.304 -0.358 ...
 $ cnd : int  1 1 1 1 1 1 1 1 1 1 ...
> outbtu[1:10,]
          bias cnd
1  -0.24756150   1
2  -0.75906264   1
3  -0.47142178   1
4  -0.30395184   1
5  -0.35756559   1
6   0.04072695   1
7  -0.45026249   1
8  -0.20509166   1
9  -0.24816174   1
10 -0.01581920   1

Where, eventually cnd reaches 27, but there are 3200 observations for each of the 27 cnd values, so you can't see it here obviously. I want 27 line segments on this graph, one corresponding to the 25th, 50th, and 75th percentile of the bias variable for each of the 27 cnd values.

Here is my code:

p <- ggplot(outbtu,aes(factor(cnd),bias,
                   ymin=quantile(bias,.25),
                   ymax=quantile(bias,.75)))
p <- p + geom_linerange()
p + geom_pointrange()

I honestly have no idea if I'm even close, that's just what I could figure out from the ggplot help pages. Thanks in advance!

set.seed(42)
DF <- data.frame(bias=rnorm(2700),cnd=1:27)
DF$cnd <- factor(DF$cnd)

library(ggplot2)
ggplot(DF,aes(x=cnd,y=bias,colour=cnd)) + 
  stat_summary(fun.data=function(x) {
    res <- quantile(x,probs=c(0.25,0.5,0.75))
    names(res)<-c("ymin","y","ymax")
    res})

Or shorter:

ggplot(DF,aes(x=cnd,y=bias,colour=cnd)) + 
  stat_summary(fun.data=median_hilow,conf.int=0.5)

You need to calculate all statistics separately and then draw obtained median and quantile values. Otherwise ymin=quantile(bias,.25) returns vector of bigger size than factor(cnd) .

Here is an example

# Generate sample data
df <- data.frame(a=rnorm(100), b=sample(1:5, 100, replace=T))
# Calculate statistics for each group of b values
df2 <- t(sapply(unique(df$b), function(x) {
  s <- summary(df[df$b == x, "a"])[c(2,3,5)]
  c(x, s)
}))
# Convert output matrix to data.frame since ggplot works only with data.frames
df2 <- as.data.frame(df2)
# Rename column names for clarity
colnames(df2) <- c("b", "Q1", "Median", "Q3")
# Draw obtained values
ggplot(df2, aes(x=b, y=Median, ymin=Q1, ymax=Q3)) + geom_pointrange()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM