简体   繁体   中英

Increment a Counter when TRUE value in Logical Column in R

I am working through R4DS and am currently on 5.6.7 Exercises ( https://r4ds.had.co.nz/transform.html#exercises-11 ).

Number 1 here asks to consider some scenarios about typical delay characteristics of flights. The first sub-bullet is "A flight is 15 minutes early 50% of the time, and 15 minutes late 50% of the time".

I want to find the flights in the "nycflights13" dataset that have an equal amount of 15 minute late arrival times, as well as 15 minute early arrival times.

Up to this point, I created a new dataframe that only has year, month, day, tail number, and arrival delay. I also used dplyr to mutate and add in "15_min_delay" and "15_min_early" logical columns.

Next, I filtered using plyr to create a new dataframe that only contains either flights that were 15 minutes early or 15 minutes late.

From here, I want to group_by the tailnums, I found I have about 2.7k unique tailnums but I have 9266 observations. Therefore, I know some tailnums will be repeated.

Once I created r odd_delays_new , I am a little lost in where to go. I have tried creating a for loop with an ifelse inside of it to loop over all 9,266 observations and +1 to a delay counter or an early counter, but that gave me an error.

odd_delays <- flights %>%
  select(year, month, day, tailnum, arr_delay) %>%
  mutate("15_minute_delay" = arr_delay == 15, "15_minute_early" = arr_delay == -15)
length(odd_delays$"15_minute_delay"[odd_delays$"15_minute_delay" == TRUE])
length(odd_delays$"15_minute_early"[odd_delays$"15_minute_early" == TRUE])
odd_delays_new <- odd_delays %>%
  filter(odd_delays$`15_minute_delay` == TRUE | odd_delays$`15_minute_early` == TRUE)
  ifelse(odd_delays_new$`15_minute_delay` == TRUE, delay = delay + 1, early = early + 1)

I expect for my results to be a 3 column data frame. The first column will have the tail number, the second column will have the amount of times the plane had a 15 minute arrival delay, and the third column will have the amount of times the plane arrived 15 minutes early.

I am going to answer this in two parts.

  • how I would approach this
  • alternatives for your for loop.

solving the problem

To answer your question, you can stay within the dplyr piping. I believe the book means at least 15 min early/late so I used >= and <= and you want to understand the relationship of the delays/early arrivals to the total so you need to first find your denominator. n() is the # of observations based on the grouping. Then I use sum() on those logical results. R will treat logical values as 0,1 if you ask it to sum() or do other math.

off_schedule <-
  flights %>%
  group_by(tailnum) %>% 
  summarise(
    n = n(),
    delay_15min = sum(arr_delay >= 15, na.rm = TRUE),
    early_15min = sum(arr_delay <= -15, na.rm = TRUE)
  ) %>% 
  ungroup() %>% 
  mutate(
    delay_pct = delay_15min/n*100,
    early_pct = early_15min/n*100,
    off_pct = delay_pct + early_pct
  )

This gives us the following table:

# tailnum       n delay_15min early_15min delay_pct early_pct off_pct
#   <chr>   <int>       <int>       <int>     <dbl>     <dbl>   <dbl>
#  D942DN      4           2           0     50          0      50  
#  N0EGMQ    371          95          73     25.6       19.7    45.3
#  N10156    153          51          34     33.3       22.2    55.6
#  N102UW     48           7          13     14.6       27.1    41.7
#  N103US     46           2          13      4.35      28.3    32.6

the for loop

For your loop to work, you would have to use the index values.

for (i in 1:nrow(odd_delays)){  
  if(odd_delays_new$`15_minute_delay`[i] == TRUE){
    odd_delays_new$delay[i] <- odd_delays_new$delay[i-1] + 1
  } 

  if(odd_delays_new$`15_minute_early`[i] == TRUE){
    odd_delays_new$early[i] <- odd_delays_new$early[i-1] + 1
  } 
}

It's not fun to write nor to read 3 years later. Fortunately, the cumsum() function can tally them up:

df <-
  odd_delays_new %>% 
  group_by(tailnum) %>% 
  mutate(
    delay = cumsum(`15_minute_delay`),
    early = cumsum(`15_minute_early`)
  ) %>% 
  ungroup()

This solution however doesn't help you understand how often N0EGMQ is off schedule, it just tells you of the N0EGMQ flights that are off schedule, how many are delays vs early arrivals.

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM