简体   繁体   中英

ggplot2 - geom_line of cumulative counts of factor levels

I want to plot the cumulative counts of level OK of factor X (*), over time (column Date ). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.

Sample data

DF <- data.frame(
  Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
  X = factor(rep("OK", 6), levels = c("OK", "NOK")),
  Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))

From similar questions I tried this:

ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')

在此输入图像描述

Using stat='count' (as the answer to this question ) is even worse:

ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')

在此输入图像描述

which shows the counts for factor levels (*), but not the accumulation over time.

Desperate measure - count with table

I tried creating a new data frame with counts using table like this:

cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
  geom_line()

在此输入图像描述

Is there a way to do this with ggplot2? Do I need to create a new column with cumsum ? If so, how should I cumsum the factor levels, by date?

(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"] , but I am sure someone can find a smarter solution.

One option using dplyr and ggplot2 can be as:

library(dplyr)
library(ggplot2)

DF %>% group_by(Group) %>%
       arrange(Date) %>%
       mutate(Value = cumsum(X=="OK")) %>%
      ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM