I would like to aggregate a Pandas DataFrame in order to count the number of children (variable child_name ) for each father (variable father_name ). The dataframe looks like this (it is a toy example of course, I want to grasp the concept):
father_name child_name
Robert Julian
Robert Emily
Robert Dan
Carl Jack
Carl Rose
John Lucy
Paul Christopher
Paul Thomas
Now, I define an aggregation dictionary and use it on the dataframe d :
import pandas as pd
aggregation = {
'child_name': {
'n_children': 'count'
}
}
d.groupby('father_name').agg(aggregation)
I obtain this output:
child_name
n_children
father_name
Carl 2
John 1
Paul 2
Robert 3
and now I would like to:
How can I do that? Maybe there's also a quicker way to do this, but I would like to learn this method too. Thanks in advance!
You could let
df_count = df.groupby('father_name').count()
df_count[df_count.child_name > 1].sort_values(by='child_name', ascending=False)
Output:
child_name
father_name
Robert 3
Carl 2
Paul 2
If you want to make heavier use of agg
, that might look something like the following (which will throw a FutureWarning
as renaming using dicts is deprecated):
df.groupby('father_name').agg({'child_name': {'n_children': lambda x: len(x) if len(x) > 1 else None}}).dropna()
then sorting the result afterwards.
Let's try like this way to meet your two conditions-
import pandas as pd
df = pd.DataFrame({"father_name":["Robert","Robert","Robert","Carl","Carl","John","Paul","Paul"],"child_name":["Julian","Emily","Dan","Jack","Rose","Lucy","Christopher","Thomas"]})
#sort the fathers according to their number of children (in decreasing order)
df = df.groupby(by='father_name').count().sort_values(['child_name'],ascending=False)
#show only the fathers that have 2 or more children
df_greater_2 = df[df['child_name'] >= 2]
print(df_greater_2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.