简体   繁体   English

计算唯一组合熊猫数据框的数量

[英]count number of unique combinations pandas data frame

I have trouble creating a summary statistic for my data.我无法为我的数据创建汇总统计。 My dataframe looks like this我的数据框看起来像这样

id   status
a    approved
a    approved
b    draft
b    redraft
c    redraft
c    draft
d    approved
d    draft

outcome:结果:

status_combo       id_count
approved,approved  1
draft,redraft      2
approved,draft     1

The code I'm using我正在使用的代码

df1=df.groupby('id')['status'].apply(tuple).rename('status_combo')
df2=df1.groupby(df1).size().reset_index(name='id_count')
print(df2)

create all combination of status where the order of status is included too however for me the desired outcome should treat draft,redraft and redraft,draft as one type of status_combo Please advise.创建包含状态顺序的所有状态组合,但是对我来说,期望的结果应该将草稿,重新草稿和重新草稿,草稿视为一种状态组合请告知。 Thanks谢谢

You can try sort the column before making it into tuple您可以在将其放入tuple之前尝试对列进行sort

df1 = df.groupby('id')['status'].apply(lambda x: tuple(sorted(x))).rename('status_combo')
df2 = df1.groupby(df1).size().reset_index(name='id_count')
print(df2)

           status_combo  id_count
0  (approved, approved)         1
1     (approved, draft)         1
2      (draft, redraft)         2

You can do :你可以做 :

df = df.groupby('id',as_index=False).agg(
    status_approved=('status',lambda x:','.join(sorted(tuple(x))))).groupby(
    'status_approved', as_index=False).agg(id_count=('id', 'count'))

print(df):打印(df):

     status_approved  id_count
0  approved,approved         1
1     approved,draft         1
2      draft,redraft         2

Can be done with a simple oneliner:可以用一个简单的oneliner来完成:

df.groupby('id').agg(set).reset_index().status.value_counts()

Result:结果:

{redraft, draft}     2
{approved}           1
{draft, approved}    1
Name: status, dtype: int64

Or in your solution add sort_values('status') :或者在您的解决方案中添加sort_values('status')

df1=df.sort_values('status').groupby('id')['status'].apply(tuple).rename('status_combo')
df2=df1.groupby(df1).size().reset_index(name='id_count')
print(df2)

Result:结果:

           status_combo  id_count
0  (approved, approved)         1
1     (approved, draft)         1
2      (draft, redraft)         2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM