简体   繁体   中英

Plotting values in ggplot over time by group, with condition where one group is one line of the mean, and the other group as individual lines in R

I have a dataset with alcohol treatment rates for each state for each year from 2010 to 2015. Five of these states received an intervention and the rest did not. I would like to plot the treatment rates for each intervention state as a separate line and the non-intervention states (grouped as one line using the mean) on the same graph.

I would like to do this using ggplot in R. I have the following code below which graphs the treatment rates for each state independently, however, I am having trouble formatting the grouping variable to meet the condition I described above by including the intervention variable with the state variable. Any help would be appreciated. Thank you in advance!

I'm fairly new to R, so I hope I am explaining this correctly. The dataset is saved as a list, and below is some dummy data showing a snippet of the structure.

year    state   Intervention    rate
2010    Alabama 0   0.006575294
2011    Alabama 0   0.002244153
2012    Alabama 0   0.002519527
2013    Alabama 0   0.00333051
2014    Alabama 0   0.002385317
2015    Alabama 0   0.003080964
2010    Alaska  1   0.00338454
2011    Alaska  1   0.003457992
2012    Alaska  1   0.002784511
2013    Alaska  1   0.00356925
2014    Alaska  1   0.004599099
2015    Alaska  1   0.004204394
2010    Arizona 0   0.002336875
2011    Arizona 0   0.002808161
2012    Arizona 0   0.00299025
2013    Arizona 0   0.0022956
ggplot(data = data, aes(x = year, y = treatment_rate, group= state))+
  geom_line()

Probably the easiest way is to separate the data based on the status of Intervention . I've generated a somewhat larger dummy dataset that should have a similar shape to the data you provided.

library(ggplot2)

set.seed(1234)

states <- rownames(USArrests)
intervened <- sample(states, 10)

df <- expand.grid(year = 2010:2015, state = states)
df$Intervention <- as.numeric(df$state %in% intervened)
df$rate <- cumsum(rnorm(nrow(df)))
head(df)
#>   year   state Intervention      rate
#> 1 2010 Alabama            0 -0.574740
#> 2 2011 Alabama            0 -1.121372
#> 3 2012 Alabama            0 -1.685824
#> 4 2013 Alabama            0 -2.575862
#> 5 2014 Alabama            0 -3.053054
#> 6 2015 Alabama            0 -4.051441

It's easier to separate the data if you need to handle these seperately while plotting. You can subset the data in the data argument of a layer. As I understood you wanted to plot states with Intervention == 1 individually, so we do that with the regular geom_line() . Then, we want to summarize all states with Intervention == 0 and to do that we use the stat_summary() function. We need to set a common group for the summarised data as we want to summarise over different states.

ggplot(df, aes(x = year, y = rate, group = state)) +
  geom_line(
    data = ~ subset(., Intervention == 1),
    aes(colour = state)
  ) +
  stat_summary(
    data = ~ subset(., Intervention == 0),
    aes(group = -1),
    fun.data = mean_se,
    geom = "line", size = 2
  )

Created on 2021-02-24 by the reprex package (v1.0.0)

Follow up:

You'd need to repeat the stat_summary() layer for every geom. For example: adding a ribbon with mean +/- sd values:

  stat_summary(
    data = ~ subset(., Intervention == 0),
    aes(group = -1),
    fun.data = function(x) {
      mx <- mean(x)
      sd <- sd(x)
      data.frame(
        ymin = mx - sd,
        ymax = mx + sd
      )
    },
    geom = "ribbon", alpha = 0.5
  )

You can replace "ribbon" with "errorbar" if you prefer that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM