I have the following aggregation in pandas:
summary_df = df.groupby(['provider', 'id']).agg(
title =('title', 'first'),
file_size = *custom*
).reset_index()
For the file_size
I would like to use the following calculation:
sum([item['file_size'] for item in df if item['is_main_video'] is True])
How would I do the above within the .agg()
?
agg
will tagret the one column as source in your case you can create another column before groupby
df['New'] = np.where(df['is_main_video'], df['file_size'], 0)
summary_df = df.groupby(['provider', 'id']).agg(
title =('title', 'first'),
file_size = ('New', 'sum')
).reset_index()
Update
summary_df = df.assign(New = np.where(df['is_main_video'], df['file_size'], 0)).groupby(['provider', 'id']).agg(
title =('title', 'first'),
file_size = ('New', 'sum')
).reset_index()
You can use Series.where
to temporarily "ignore" your file_sizes where "is_main_video" is False, then perform your groupby operation to sum the what's leftover:
import pandas as pd
df = pd.DataFrame({
"provider": ["A", "A", "A", "B", "B"],
"title": ["hello", "world", "pandas", "example", "here"],
"is_main_video": [True, False, True, True, False],
"file_size": [10, 12, 20, 19, 10]
})
print(df)
provider title is_main_video file_size
0 A hello True 10
1 A world False 12
2 A pandas True 20
3 B example True 19
4 B here False 10
aggregated_df = (df.assign(file_size=df["file_size"].where(df["is_main_video"]))
.groupby("provider", as_index=False)
.agg(
title=("title", "first"),
file_size=("file_size", "sum"))
)
print(aggregated_df)
provider title file_size
0 A hello 30.0
1 B example 19.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.