简体   繁体   English

Pandas groupby 多列与 value_counts function

[英]Pandas groupby multiple columns with value_counts function

I want to apply value_counts() to multiple columns and reuse the same dataframe further to add more columns.我想将value_counts()应用于多个列并重用相同的 dataframe 进一步添加更多列。 I have the following dataframe as an example.我以下面的 dataframe 为例。

    id  shop    type    status
0   1   mac      A      open
1   1   mac      B      close
2   1   ikea     B      open
3   1   ikea     A      open
4   1   meta     A      open
5   1   meta     B      close
6   2   meta     B      open
7   2   ikea     B      open
8   2   ikea     B      close
9   3   ikea     A      close
10  3   apple    B      close
11  3   apple    B      open
12  3   apple    A      open
13  4   denim    A      close
14  4   denim    A      close

I want to achieve, the groupby count of both id and shop for each type and status category as shown below.我想实现,每个typestatus类别的idshop的 groupby 计数,如下所示。

    id  shop    A    B     close   open
0   1   ikea    1    1      0       2
1   1   mac     1    1      1       1
2   1   meta    1    1      1       1
3   2   ikea    0    2      1       1
4   2   meta    0    1      0       1
5   3   apple   1    2      1       2
6   3   ikea    1    0      1       0
7   4   denim   2    0      2       0

I have tried this so far which works correctly but I don't feel that it is efficient, especially if I have more data and maybe want to use an extra two aggs functions for the same groupby.到目前为止,我已经尝试过它可以正常工作,但我觉得它效率不高,特别是如果我有更多数据并且可能想为同一个 groupby 使用额外的两个 aggs 函数。 Also, the merging may not always work in some rare cases.此外,在极少数情况下,合并可能并不总是有效。

import pandas as pd
from functools import reduce

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()

df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])

You can do with groupby() and value_counts :您可以使用groupby()value_counts

groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
           groups['status'].value_counts().unstack(fill_value=0)], 
          axis=1).reset_index()

Or a bit more dynamic:或者更动态一点:

groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0) 
                for c in count_cols], axis=1).reset_index()

Or with crosstab :或使用crosstab

count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
                for c in count_cols], axis=1).reset_index()

Output: Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

Using crosstab :使用crosstab

out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
                 for c in ['type', 'status']],
                axis=1).reset_index()

Or melt + crosstab :melt + crosstab

df2 = df.melt(['id', 'shop'])

out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
         .reset_index()
       )

Output: Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

here is one way to do it using pd.get_dummies这是使用pd.get_dummies的一种方法


(pd.concat(
    [df, #original dataframe
     pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
    ], axis=1)
 .groupby(['id','shop']) # group the data
 .sum()
 .reset_index())

id  shop    A   B   close   open
0   1   ikea    1   1   0   2
1   1   mac     1   1   1   1
2   1   meta    1   1   1   1
3   2   ikea    0   2   1   1
4   2   meta    0   1   0   1
5   3   apple   1   2   1   2
6   3   ikea    1   0   1   0
7   4   denim   2   0   2   0
# Module improt
import pandas as pd
import numpy as np

# Data import
    df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

# Data Pre-process
df_unique = df[['id','shop']].groupby(['id','shop']).count().reset_index()
df_AB = df.groupby(['id','shop','type']).count().reset_index()
df_A = df_AB.loc[df_AB['type'] =='A'].rename(columns={'status':'A'})
df_B = df_AB.loc[df_AB['type'] =='B'].rename(columns={'status':'B'})
df_OC = df.groupby(['id','shop','status']).count().reset_index()
df_O = df_OC.loc[df_OC['status'] =='open'].rename(columns={'type':'open'})
df_C = df_OC.loc[df_OC['status'] =='close'].rename(columns={'type':'close'})

# Merging for your final output
df_final = pd.merge(df_unique,df_A[['id','shop','A']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_B[['id','shop','B']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_C[['id','shop','close']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_O[['id','shop','open']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])

# Data Cleaning
df_final['A'] = df_final['A'].fillna(0)
df_final['B'] = df_final['B'].fillna(0)
df_final['open'] = df_final['open'].fillna(0)
df_final['close'] = df_final['close'].fillna(0)

# Output Display
df_final

Hi Brother,嗨,兄弟,

Here is the whole process from me and you can run it from your platform.这是我的整个过程,您可以从您的平台运行它。

Attached the picture of output from me附上我的output图片

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM