简体   繁体   English

Pandas 按列分组并获得频率为 0

[英]Pandas Groupby columns and get a frequency of 0

I have a dataframe, I want to groupby Col1 Col2 Col3 and get the 0 frequency of Value column: df =我有一个 dataframe,我想按 Col1 Col2 Col3 分组并获得 Value 列的 0 频率:df =

Col1 Col2 Col3 Value
Val1 Val2  A    0
Val1 Val2  A    1
Val1 Val2  A    2
Val1 Val2  A    0
Val1 Val2  A    1

Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    1
...

How do I apply groupby to achieve如何应用 groupby 来实现

Col1 Col2 Col3 Fercentage_of_0
Val1 Val2  A       0.2
Val1 Val2  B       0.8
...

Thank you!谢谢!

A simple lambda function does it for you.一个简单的lambda function 为您完成。 Generate a list where Value==0 , takes len of this list and len of items in group.生成一个列表,其中Value==0 ,获取此列表的 len 和组中的项目 len。 You have percentage你有百分比

df = pd.DataFrame({"Col1":["Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1"],"Col2":["Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2"],"Col3":["A","A","A","A","A","B","B","B","B","B"],"Value":[0,1,2,0,1,0,0,0,0,1]})

df.groupby(["Col1","Col2","Col3"]).\
    agg({"Value":lambda x: len([v for v in x if v==0])/len(x)})

output output

                Value
Col1 Col2 Col3       
Val1 Val2 A       0.4
          B       0.8

Use groupby on the dataframe and then apply size() method on resulting dataframe.在 dataframe 上使用 groupby,然后对生成的 dataframe 应用 size() 方法。 For example lets say you have createda dataframe named df containing these values例如,假设您创建了一个名为 df 的 dataframe 包含这些值

df = pd.DataFrame({'Col1': ['Val1','Val1','Val1','Val1','Val1','Val1','Val1','Val1'], 
               'Col2': ['Val2','Val2','Val2','Val2','Val2','Val2','Val2','Val2'],
               'Col3': ['A','A','A','A','B','B','B','B'],
               'Value':[0,1,2,0,0,0,0,1]}) 

then frequenncy count on individual element can be found using然后可以使用找到单个元素的频率计数

df.groupby(['Col1','Col2','Col3','Value']).size()
Col1  Col2  Col3  Value
Val1  Val2  A     0        2
                  1        1
                  2        1
            B     0        3
                  1        1
dtype: int64

Here's another way without using lambda, which seems more understandable to me:这是不使用 lambda 的另一种方法,这对我来说似乎更容易理解:

df['is_zero'] = df['Value'] == 0
df.groupby(['Col1', 'Col2', 'Col3'])['is_zero'].mean()

Create a boolean column for Value equal to 0, and groupby on the Col columnsValue等于 0 创建 boolean 列,并在Col列上进行 groupby

(
    df.assign(Percentage_Of_0=lambda x: x.Value.eq(0))
    .groupby(["Col1", "Col2", "Col3"], as_index=False)
    .Percentage_Of_0.mean()
)

    Col1    Col2    Col3    Percentage_Of_0
0   Val1    Val2    A       0.4
1   Val1    Val2    B       0.8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM