简体   繁体   English

汇总后排序并选择(熊猫)

[英]Sort and select after aggregation (Pandas)

I would like to aggregate a Pandas DataFrame in order to count the number of children (variable child_name ) for each father (variable father_name ). 我想以计数的孩子每个父亲(可变father_name)的数量(可变CHILD_NAME)聚集一大熊猫数据帧。 The dataframe looks like this (it is a toy example of course, I want to grasp the concept): 数据框看起来像这样(当然,这是一个玩具示例,我想掌握这个概念):

father_name   child_name
Robert        Julian
Robert        Emily
Robert        Dan
Carl          Jack
Carl          Rose
John          Lucy
Paul          Christopher
Paul          Thomas

Now, I define an aggregation dictionary and use it on the dataframe d : 现在,我定义一个聚合字典并在数据框d上使用它:

import pandas as pd
aggregation = {
    'child_name': {
        'n_children': 'count'
    }
}
d.groupby('father_name').agg(aggregation)

I obtain this output: 我得到以下输出:

            child_name
            n_children
father_name           
Carl                 2
John                 1
Paul                 2
Robert               3

and now I would like to: 现在我想:

  • sort the fathers according to their number of children (in decreasing order) 根据孩子的数量对父亲进行排序(降序排列)
  • show only the fathers that have 2 or more children 只显示有两个或更多孩子的父亲

How can I do that? 我怎样才能做到这一点? Maybe there's also a quicker way to do this, but I would like to learn this method too. 也许还有一种更快的方法可以做到这一点,但是我也想学习这种方法。 Thanks in advance! 提前致谢!

You could let 你可以让

df_count = df.groupby('father_name').count()
df_count[df_count.child_name > 1].sort_values(by='child_name', ascending=False)

Output: 输出:

             child_name
father_name
Robert                3
Carl                  2
Paul                  2

If you want to make heavier use of agg , that might look something like the following (which will throw a FutureWarning as renaming using dicts is deprecated): 如果要大量使用agg ,则可能类似于以下内容(不建议使用FutureWarning重命名,这引发FutureWarning ):

df.groupby('father_name').agg({'child_name': {'n_children': lambda x: len(x) if len(x) > 1 else None}}).dropna()

then sorting the result afterwards. 然后将结果排序。

Let's try like this way to meet your two conditions- 让我们以这种方式尝试满足您的两个条件-

    import pandas as pd
    df = pd.DataFrame({"father_name":["Robert","Robert","Robert","Carl","Carl","John","Paul","Paul"],"child_name":["Julian","Emily","Dan","Jack","Rose","Lucy","Christopher","Thomas"]})

    #sort the fathers according to their number of children (in decreasing order)
    df = df.groupby(by='father_name').count().sort_values(['child_name'],ascending=False)

    #show only the fathers that have 2 or more children
    df_greater_2 = df[df['child_name'] >= 2]

    print(df_greater_2)

DEMO: https://repl.it/@SanyAhmed/EarnestTatteredRepo 演示: https : //repl.it/@SanyAhmed/EarnestTatteredRepo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM