简体   繁体   中英

Passing a custom function into pandas .agg()

I have the following aggregation in pandas:

summary_df = df.groupby(['provider', 'id']).agg(
    title           =('title', 'first'),
    file_size       = *custom*
).reset_index()

For the file_size I would like to use the following calculation:

sum([item['file_size'] for item in df if item['is_main_video'] is True])

How would I do the above within the .agg() ?

agg will tagret the one column as source in your case you can create another column before groupby

df['New'] = np.where(df['is_main_video'], df['file_size'], 0)
summary_df = df.groupby(['provider', 'id']).agg(
    title           =('title', 'first'),
    file_size       = ('New', 'sum')
).reset_index()

Update

summary_df = df.assign(New = np.where(df['is_main_video'], df['file_size'], 0)).groupby(['provider', 'id']).agg(
    title           =('title', 'first'),
    file_size       = ('New', 'sum')
).reset_index()

You can use Series.where to temporarily "ignore" your file_sizes where "is_main_video" is False, then perform your groupby operation to sum the what's leftover:

import pandas as pd

df = pd.DataFrame({
    "provider": ["A", "A", "A", "B", "B"],
    "title": ["hello", "world", "pandas", "example", "here"],
    "is_main_video": [True, False, True, True, False],
    "file_size": [10, 12, 20, 19, 10]
})

print(df)
  provider    title  is_main_video  file_size
0        A    hello           True         10
1        A    world          False         12
2        A   pandas           True         20
3        B  example           True         19
4        B     here          False         10
aggregated_df = (df.assign(file_size=df["file_size"].where(df["is_main_video"]))
                 .groupby("provider", as_index=False)
                 .agg(
                     title=("title", "first"),
                     file_size=("file_size", "sum"))
                )

print(aggregated_df)
  provider    title  file_size
0        A    hello       30.0
1        B  example       19.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM