简体   繁体   English

绘制R中的分组概率

[英]Plotting grouped probabilities in R

I'm new to R and I'm trying to graph probability of flight delays by hour of day. 我是R的新手,正在尝试按一天的小时数绘制航班延误的概率图。 Probability of flight delays would be calculated using a "Delays" column of 1's and 0's. 航班延误的概率将使用1和0的“ Delays”列来计算。

Here's what I have. 这就是我所拥有的。 I was trying to put a custom function into fun.y, but it doesn't seem like it's allowed. 我试图将自定义函数放入fun.y,但似乎不被允许。

library(ggplot2)    
ggplot(data = flights, aes(flights$HourOfDay, flights$ArrDelay)) + 
           stat_summary(fun.y = (sum(flights$Delay)/no_na_flights), geom = "bar") + 
           scale_x_discrete(limits=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)) +
           ylim(0,500)

What's the best way to do this? 最好的方法是什么? Thanks in advance. 提前致谢。

I am not sure if that is what you wanted, but I did it in the following way: 我不确定这是否是您想要的,但是我通过以下方式做到了:

library(ggplot2)    
library(dplyr)
library(nycflights13)

probs <- flights %>%
  # Testing whether a delay occurred for departure or arrival
  mutate(Delay = dep_delay > 0 | arr_delay > 0) %>%
  # Grouping the data by hour
  group_by(hour) %>%
  # Calculating the proportion of delays for each hour
  summarize(Prob_Delay = sum(Delay, na.rm = TRUE) / n()) %>%
  ungroup()

theme_set(theme_bw())
ggplot(probs) +
  aes(x = hour,
      y = Prob_Delay) +
  geom_bar(stat = "identity") +
  scale_x_continuous(breaks = 0:24)

Which gives the following plot: 给出以下图: 结果图

I think it is always better to do data manipulation outside ggplot, using for instance dplyr. 我认为最好在ggplot之外进行数据处理,例如使用dplyr。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM