[英]python pandas dataframe on grouping resulting in two columns
I have a dataframe df
: 我有一个数据框df
:
PID AID Ethnicity
1 A Asian
1 B Asian
1 C Arab
1 D African
2 A Asian
2 D African
2 E Caucasian
2 F African
2 B Asian
I want to generate a frame that tells me for each PID
how many AIDs
it has, and how many Ethnic groups: 我想生成一个框架,告诉我每个PID
多少个AIDs
以及多少个种族:
So for the above the resulting newdf
would be: 因此,对于上述结果, newdf
将是:
PID numAID numEthnicities
1 4 3
2 5 3
I know how to find numAID: 我知道如何找到numAID:
newdf = df[['PID','AID']].groupby('PID',
as_index=False).count().rename(columns={'AID':'numAID'})
I'm not sure how to add the third column to the dataframe
. 我不确定如何将第三列添加到dataframe
。
This will work: 这将起作用:
df.groupby('PID').agg({'AID':'count','Ethnicity':pd.Series.nunique}).add_prefix('num')
numAID numEthnicity
PID
1 4 3
2 5 3
您可以添加第三列,如下所示:
newdf['numEthnicities'] = df[['PID, 'Ethnicity']].groupby('PID', as_index=False).count()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.