简体   繁体   中英

Python - Group by with multiple conditions on columns

I have the following example dataframe:

data = {'ref':['1', '2', '3', '4', '5'],
        'checked':[True, True, True, False, True],
        'rag':['r', 'r', 'g', 'a', 'r'],
        'group':['high', 'low', 'high', 'medium', 'high']}

dataframe = pd.DataFrame(data)

在此处输入图像描述

I want to group on group and do some conditional counts where certain conditions are met so I get the following:

在此处输入图像描述

I can group by group and do n by the following:

df = dataframe.groupby(['group']).agg(
         n=('ref', 'count')
        ).reset_index()

But I am struggling to also count the number of times for each group that:

  • checked = True
  • rag = g
  • rag = a
  • rag = r

Any help would be much appreciated!


edit: changed True/False strings to Boolean

You have a few challenges.

For instance, your True/False are strings, so you should either initialize them as booleans and use sum or convert to boolean during aggregation.

To count the rag, it's easier to use pandas.crosstab and join it while you still have the groups as index.

df = (dataframe
      .groupby(['group'])
      .agg(**{'n': ('ref', 'count'),
              'checked=True': ('checked', lambda s: s.eq('True').sum()),
           })
      .join(pd.crosstab(dataframe['group'], dataframe['rag'])
              .add_prefix('rag=')
           )
      .reset_index()
     )

output:

    group  n  checked=True  rag=a  rag=g  rag=r
0    high  3             3      0      1      2
1     low  1             1      0      0      1
2  medium  1             0      1      0      0

You can try pivot_table separately on your checked and rag columns

n = df.groupby(['group']).agg(n=('ref', 'count'))

dfs = []
for column in ['checked', 'rag']:
    df_ = (df.pivot_table(index='group', columns=[column], values='ref',
                          aggfunc='count', fill_value=0)
          .rename(columns=lambda col: f'{column}={col}')
          .rename_axis(None, axis=1))
    dfs.append(df_)
df = pd.concat(dfs, axis=1).drop('checked=False', axis=1)
print(n.join(df))

        n  checked=True  rag=a  rag=g  rag=r
group
high    3             3      0      1      2
low     1             1      0      0      1
medium  1             0      1      0      0

There is an issue with your data example:

data = {'ref':['1', '2', '3', '4', '5'],
        'checked':[True, True, True, False, True],
        'rag':['r', 'r', 'g', 'a', 'r'],
        'group':['high', 'low', 'high', 'medium', 'high']}

df = pd.DataFrame(data)

for checked column, you should enter value as True/False without in quotation. Otherwise, python will interpret True'/'Fasle' as string .

The idea here is two steps: (1) you use iteration with groupby . (2) then you merge / concat them into 1 table:

# Create empty table
table = pd.DataFrame()

# List of column you want to iterate:
col_iter = ['checked', 'rag']

# Iterate:
for col in col_iter:
    # Obtain unique values in each col used
    uni = df[col].unique()
    
    # Iterate for each unique value in col.
    # Set tem var
    # Concat tem to table
    for val in uni:
        tem = df.groupby('group').apply(lambda g: (g[col]==val).sum())
        table = pd.concat([table, tem], axis=1).rename(columns={0:f'{col}={val}'})
        

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM