How to create a Pandas Groupby object where each column has a filter on the original data?

Question

I'm trying to replicate the following SQL with Pandas, but it's surprisingly more complex than I expected:

SELECT
    id
    , count(*)
    , count(case when some_condition = True then 1 end)
    , count(case when some_other_condition = False then 1 end)
FROM table
GROUP BY id

The only thing I can think of is something like this:

grouped = df.groupby('id')
df_total = grouped.count()
df_some_condition = grouped.filter(...).count()
df_some_other_condition = grouped.filter(...).count()
df_total.join(df_some_condition, on='id').join(df_some_other_condition, on='id')

I'm just surprised that I can't make filtered columns with groupby().count(), and that I have to create 3 separate dataframes and then join them. Is there a simpler way to do this that I'm overlooking?

Note: the syntax may not be exactly correct here, just wrote up something quickly to illustrate my issue.

Answer 1

df = pd.DataFrame({'id': [1, 1, 2, 2, 3, 3, 4],
 'val1': [0.0, 48.0, 4.0, 20.0, 24.0, 25.0, 0.0],
 'val2': [0.0, 0.0, 1.0, 40.0, 22.0, 7.0, 13.0]})

df

    id  val1    val2
0   1   0.0      0.0
1   1   48.0     0.0
2   2   4.0      1.0
3   2   20.0    40.0
4   3   24.0    22.0
5   3   25.0     7.0
6   4   0.0     13.0

How you could recreate the select

df.assign(result1 = np.where(df['val1']<25, 1,0),
          result2 = np.where(df['val2'] > 4,1,0)).groupby('id').agg(count=('id','size'),
                                                                    res1_sum=('result1',sum),
                                                                    res2_sum=('result2',sum))

Output

    count   res1_sum    res2_sum
id          
1       2          1           0
2       2          2           1
3       2          1           2
4       1          1           1

How to create a Pandas Groupby object where each column has a filter on the original data?

Question

1 answers

solution1
0 2020-09-12 01:12:17

How to create a Pandas Groupby object where each column has a filter on the original data?

Question

1 answers

solution1 0 2020-09-12 01:12:17

solution1
0 2020-09-12 01:12:17