[英]Sort and select after aggregation (Pandas)
我想以计数的孩子每个父亲(可变father_name)的数量(可变CHILD_NAME)聚集一大熊猫数据帧。 数据框看起来像这样(当然,这是一个玩具示例,我想掌握这个概念):
father_name child_name
Robert Julian
Robert Emily
Robert Dan
Carl Jack
Carl Rose
John Lucy
Paul Christopher
Paul Thomas
现在,我定义一个聚合字典并在数据框d上使用它:
import pandas as pd
aggregation = {
'child_name': {
'n_children': 'count'
}
}
d.groupby('father_name').agg(aggregation)
我得到以下输出:
child_name
n_children
father_name
Carl 2
John 1
Paul 2
Robert 3
现在我想:
我怎样才能做到这一点? 也许还有一种更快的方法可以做到这一点,但是我也想学习这种方法。 提前致谢!
你可以让
df_count = df.groupby('father_name').count()
df_count[df_count.child_name > 1].sort_values(by='child_name', ascending=False)
输出:
child_name
father_name
Robert 3
Carl 2
Paul 2
如果要大量使用agg
,则可能类似于以下内容(不建议使用FutureWarning
重命名,这将引发FutureWarning
):
df.groupby('father_name').agg({'child_name': {'n_children': lambda x: len(x) if len(x) > 1 else None}}).dropna()
然后将结果排序。
让我们以这种方式尝试满足您的两个条件-
import pandas as pd
df = pd.DataFrame({"father_name":["Robert","Robert","Robert","Carl","Carl","John","Paul","Paul"],"child_name":["Julian","Emily","Dan","Jack","Rose","Lucy","Christopher","Thomas"]})
#sort the fathers according to their number of children (in decreasing order)
df = df.groupby(by='father_name').count().sort_values(['child_name'],ascending=False)
#show only the fathers that have 2 or more children
df_greater_2 = df[df['child_name'] >= 2]
print(df_greater_2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.