I am currently working on an air traffic dataset that contains origins, destinations and some other air traffic related info. However, for my analysis, I would like to combine information as long as the flights go between the same two cities.
For example, the data of flights from Seattle to Portland need to be combined with the data of flights from Portland to Seattle.
Here is a sample of the dataset:
airtravel <- structure(list(CARRIER = structure(c(6L, 13L, 6L, 1L, 1L, 13L,
17L, 17L, 13L, 13L, 13L, 13L, 2L, 1L, 13L), .Label = c("9E",
"AA", "AS", "B6", "DL", "EV", "F9", "G4", "HA", "MQ", "NK", "OH",
"OO", "UA", "WN", "YV", "YX"), class = "factor"), OD = c("DCA - ORD",
"PDX - SEA", "ORD - DCA", "CHA - ATL", "ATL - CHA", "ELM - DTW",
"LGA - RIC", "RIC - LGA", "DTW - ELM", "BZN - SEA", "SEA - BZN",
"SEA - PDX", "DCA - LGA", "AVL - ATL", "SFO - SNA"), diff = c(164, 158, 146,
142, 141, 138, 138, 138, 136, 130, 130, 130, 127, 124, 124
)), row.names = c(2983L, 7423L, 3217L, 115L, 17L, 6737L,
11042L, 11315L, 6669L, 6370L, 7624L, 7636L, 685L, 66L, 7693L), class = "data.frame")
I would like to sum up the diff of rows that involve the same two cities. Could someone shed some light on how to solve this?
Thanks in advance!
You can divide OD
column to source
and destination based on '-'
separator between them, rowwise sort them using pmin
and pmax
and get sum
of diff
.
library(dplyr)
airtravel %>%
tidyr::separate(OD, c('source', 'destination'), sep = '\\s*-\\s*') %>%
group_by(grp = pmin(source, destination), grp2 = pmax(source, destination)) %>%
summarise(diff = sum(diff))
# grp grp2 diff
# <chr> <chr> <dbl>
#1 ATL AVL 124
#2 ATL CHA 283
#3 BZN SEA 260
#4 DCA LGA 127
#5 DCA ORD 310
#6 DTW ELM 274
#7 LGA RIC 276
#8 PDX SEA 288
#9 SFO SNA 124
If you want to keep more columns you can add them in group_by
.
We can use base R
to do this by splitting the 'OD' column and then sort
to be used as grouping variable in aggregate
aggregate(airtravel$diff, list(OD = sapply(strsplit(airtravel$OD, "\\s*-\\s*"),
function(x) paste(sort(x), collapse=" - "))), FUN = sum)
# OD x
#1 ATL - AVL 124
#2 ATL - CHA 283
#3 BZN - SEA 260
#4 DCA - LGA 127
#5 DCA - ORD 310
#6 DTW - ELM 274
#7 LGA - RIC 276
#8 PDX - SEA 288
#9 SFO - SNA 124
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.