I'm currently studying pandas and I come from an R/dplyr/tidyverse background.
Pandas has a not-so-intuitive API and how would I elegantly rewrite such operation from dplyr using pandas syntax?
library("nycflights13")
library("tidyverse")
delays <- flights %>%
group_by(dest) %>%
summarize(
count = n(),
dist = mean(distance, na.rm = TRUE),
delay = mean(arr_delay, na.rm = TRUE)
) %>%
filter(count > 20, dest != "HNL")
pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself
That's not exactly true. You could actually rename the columns inside agg
similar to in R. Although it is a better idea to not use count
as a column name as it is also an attribute:
delays = (
flights
.groupby('dest', as_index=False)
.agg(
count=('year', 'count'),
dist=('distance', 'mean'),
delay=('arr_delay', 'mean'))
.query('count > 20 & dest != "HNL"')
.reset_index(drop=True)
)
We can write a pandas concatenation of functions and methods that results in the same dataframe/tibble:
delays = (
flights
.groupby('dest', as_index=False)
.agg({
'year': 'count',
'distance': 'mean',
'arr_delay': 'mean',
})
.rename(columns={
'year': 'count',
'distance': 'dist',
'arr_delay': 'delay',
})
.query('count > 20 & dest != "HNL"')
.reset_index(drop=True)
)
It's more lengthy: Pandas' pd.DataFrame.agg
method doesn't allow much flexibility for changing columns' names in the method itself.
But it's as elegant, clean and clear as pandas allows us to go.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.