简体   繁体   中英

Extracting unique column combination and finding sum and count in R

I have a flight database with 4 columns like shown below.

Original:

I want an output which gives rows based on unique combination of 3 col (origin/destination/Airline), sums the number of passengers for each unique combination and count the numbers of rows for each unique combination. The result would be something like this.

Output:

I am able to do 1 part of it using the group_by function

df %>% group_by(Origin, destination, carrier) %>% summarise(count = n())

How to include the sum of population?

We can use dplyr

library(dplyr)
df1 %>%
   group_by(Origin, Destination, Airline) %>%
   dplyr::summarise(count = n(), TotalPassengers = sum(Passengers))
# Groups:   Origin, Destination [2]
#  Origin Destination Airline count TotalPassengers
#  <chr>  <chr>       <chr>   <int>           <dbl>
#1 ABE    ATL         9A          2               3
#2 ABE    ATL         DL          1               5
#3 NYC    SFA         AA          3              21
#4 NYC    SFA         DL          1               5

data

df1 <- data.frame(Origin = rep(c("ABE", "NYC"), c(3, 4)),
      Destination = rep(c("ATL", "SFA"), c(3, 4)),
      Airline = c("9A", "9A", "DL", "AA", "AA", "AA", "DL"),
      Passengers = c(2, 1, 5, 4, 10, 7, 5))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM