I'm trying to replicate the following SQL with Pandas, but it's surprisingly more complex than I expected:
SELECT
id
, count(*)
, count(case when some_condition = True then 1 end)
, count(case when some_other_condition = False then 1 end)
FROM table
GROUP BY id
The only thing I can think of is something like this:
grouped = df.groupby('id')
df_total = grouped.count()
df_some_condition = grouped.filter(...).count()
df_some_other_condition = grouped.filter(...).count()
df_total.join(df_some_condition, on='id').join(df_some_other_condition, on='id')
I'm just surprised that I can't make filtered columns with groupby().count(), and that I have to create 3 separate dataframes and then join them. Is there a simpler way to do this that I'm overlooking?
Note: the syntax may not be exactly correct here, just wrote up something quickly to illustrate my issue.
df = pd.DataFrame({'id': [1, 1, 2, 2, 3, 3, 4],
'val1': [0.0, 48.0, 4.0, 20.0, 24.0, 25.0, 0.0],
'val2': [0.0, 0.0, 1.0, 40.0, 22.0, 7.0, 13.0]})
df
id val1 val2
0 1 0.0 0.0
1 1 48.0 0.0
2 2 4.0 1.0
3 2 20.0 40.0
4 3 24.0 22.0
5 3 25.0 7.0
6 4 0.0 13.0
How you could recreate the select
df.assign(result1 = np.where(df['val1']<25, 1,0),
result2 = np.where(df['val2'] > 4,1,0)).groupby('id').agg(count=('id','size'),
res1_sum=('result1',sum),
res2_sum=('result2',sum))
Output
count res1_sum res2_sum
id
1 2 1 0
2 2 2 1
3 2 1 2
4 1 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.