简体   繁体   中英

Build interval then calculate the maximum number of Disjoint Intervals

I want to know the maximum number of flight on the ground by station.

I have time when the flight arrive to the station and depart from the station.

the problem that my data frame is in this format

  REG DEP ARV  STD                    STA
  XYZ ZRH GVA  2021-08-01 07:20:00    2021-08-01 08:35:00
  XYZ GVA ZRH  2021-08-01 09:20:00    2021-08-01 10:35:00
  KLN MUC GVA  2021-08-01 06:00:00    2021-08-01 07:10:00
  KLN GVA CGD  2021-08-01 08:45:00    2021-08-01 10:10:00

So in this example flight XYZ arrive in GVA AT 08H35 (first line STA) and then depart from GVA AT 09H20( LINE 2 STD) and flight KLN arrive to GVA AT 07H10 and depart AT 08H45. so from 08h35 to 08h45 there is 2 flight in GVA..

the output should be 2 if for this day there is only this two flight who meet in GVA.

if in other time in the day there other flight who meet suppose there is 5 flights in the afternoon who meet in GVA. so the output should be the maximum it mean 5 .

so I was thinking to build interval by flight [STA, STD] or [STD,STA] then find Maximal Disjoint Intervals...

I tried this code to builds the interval but is not working..

interval_sta_std<-function(i,j){
for (i in 1:length(df)){
  key=  df$DEP[i]
  min_key=min(df$STD[i])
  max_key=max(df$STD[i])

 
 for (j in 1:length(df)){
   value=  df$ARV[j]
   min_value=min(df$STA[j])
   max_value=max(df$STA[j])
 
 if(value==key) {
   
   
test_inter<-interval(min(min_value,min_key),
                    max(max_key,max_value))
 }
 }
 }
   return(test_inter)}

Perhaps one way is to look at after minute during your data and count how many flights are on deck for that minute. This doesn't always scale well depending on the breadth of your data, but if you limit minutes to a reasonable scope, then it should be fine.

Sample data

quux <- structure(list(REG = c("XYZ", "XYZ", "KLN", "KLN"), DEP = c("ZRH", "GVA", "MUC", "GVA"), ARV = c("GVA", "ZRH", "GVA", "CGD"), STD = structure(c(1627816800, 1627824000, 1627812000, 1627821900), class = c("POSIXct", "POSIXt"), tzone = ""), STA = structure(c(1627821300, 1627828500, 1627816200, 1627827000), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA, -4L), class = "data.frame")
quux[,c("STD","STA")] <- lapply(quux[,c("STD","STA")], as.POSIXct)

(Converting your STD and STA to POSIXt objects.)

base R with fuzzyjoin

minutes <- seq(min(quux$STD), max(quux$STA), by = "mins")
head(minutes)
# [1] "2021-08-01 06:00:00 EDT" "2021-08-01 06:01:00 EDT" "2021-08-01 06:02:00 EDT" "2021-08-01 06:03:00 EDT"
# [5] "2021-08-01 06:04:00 EDT" "2021-08-01 06:05:00 EDT"
length(minutes)
# [1] 276
range(minutes)
# [1] "2021-08-01 06:00:00 EDT" "2021-08-01 10:35:00 EDT"

Now the join and aggregation.

joined <- fuzzyjoin::fuzzy_left_join(data.frame(M = minutes), quux, by = c("M" = "STD", "M" = "STA"), match_fun = list(`>=`, `<=`))
head(joined)
#                     M REG DEP ARV                 STD                 STA
# 1 2021-08-01 06:00:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
# 2 2021-08-01 06:01:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
# 3 2021-08-01 06:02:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
# 4 2021-08-01 06:03:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
# 5 2021-08-01 06:04:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
# 6 2021-08-01 06:05:00 KLN MUC GVA 2021-08-01 06:00:00 2021-08-01 07:10:00
nrow(joined)
# [1] 327

Recall that we had 276 in minutes . Now we have 327, indicating that 51 rows (each minute) indicate more than one flight on deck at a time.

joined2 <- aggregate(REG ~ M, data = joined[complete.cases(joined),], FUN = length)
nrow(joined2)
# [1] 258
head(joined2)
#                     M REG
# 1 2021-08-01 06:00:00   1
# 2 2021-08-01 06:01:00   1
# 3 2021-08-01 06:02:00   1
# 4 2021-08-01 06:03:00   1
# 5 2021-08-01 06:04:00   1
# 6 2021-08-01 06:05:00   1

We've reduced a bit, indicating that 258 minutes during the day(s) in the data had at least one plane on deck; if you look at where REG > 1 , you'll find where there are two or more.

The final piece:

joined2$Date <- as.Date(joined2$M)
aggregate(REG ~ Date, data = joined2, FUN = max)
#         Date REG
# 1 2021-08-01   2

Note: this might be subject to time zone issues, ensure you're confident they are all correct.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM