How would I group, summarize and filter a DF in pandas in dplyr-fashion?

Question

I'm currently studying pandas and I come from an R/dplyr/tidyverse background.

Pandas has a not-so-intuitive API and how would I elegantly rewrite such operation from dplyr using pandas syntax?

library("nycflights13")
library("tidyverse")

delays <- flights %>%
  group_by(dest) %>%
  summarize(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>%
  filter(count > 20, dest != "HNL")

Answer 1

pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself

That's not exactly true. You could actually rename the columns inside agg similar to in R. Although it is a better idea to not use count as a column name as it is also an attribute:

    delays = (
    flights
    .groupby('dest', as_index=False)
    .agg(
        count=('year', 'count'),
        dist=('distance', 'mean'),
        delay=('arr_delay', 'mean'))
    .query('count > 20 & dest != "HNL"')
    .reset_index(drop=True)
)

Answer 2

We can write a pandas concatenation of functions and methods that results in the same dataframe/tibble:

delays = (
    flights
    .groupby('dest', as_index=False)
    .agg({
        'year': 'count',
        'distance': 'mean',
        'arr_delay': 'mean',
    })
    .rename(columns={
        'year': 'count',
        'distance': 'dist',
        'arr_delay': 'delay',
    })
    .query('count > 20 & dest != "HNL"')
    .reset_index(drop=True)
)

It's more lengthy: Pandas' pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself.

But it's as elegant, clean and clear as pandas allows us to go.

How would I group, summarize and filter a DF in pandas in dplyr-fashion?

Question

2 answers

solution1
3 ACCPTED 2022-09-09 22:41:34

solution2
2 2022-09-09 22:32:56

How would I group, summarize and filter a DF in pandas in dplyr-fashion?

Question

2 answers

solution1 3 ACCPTED 2022-09-09 22:41:34

solution2 2 2022-09-09 22:32:56

solution1
3 ACCPTED 2022-09-09 22:41:34

solution2
2 2022-09-09 22:32:56