简体   繁体   English

python pandas dataframe在分组时导致两列

[英]python pandas dataframe on grouping resulting in two columns

I have a dataframe df : 我有一个数据框df

PID    AID    Ethnicity
1      A      Asian
1      B      Asian
1      C      Arab
1      D      African
2      A      Asian
2      D      African
2      E      Caucasian 
2      F      African
2      B      Asian

I want to generate a frame that tells me for each PID how many AIDs it has, and how many Ethnic groups: 我想生成一个框架,告诉我每个PID多少个AIDs以及多少个种族:

So for the above the resulting newdf would be: 因此,对于上述结果, newdf将是:

PID    numAID    numEthnicities
1      4         3
2      5         3

I know how to find numAID: 我知道如何找到numAID:

newdf = df[['PID','AID']].groupby('PID',  
as_index=False).count().rename(columns={'AID':'numAID'})

I'm not sure how to add the third column to the dataframe . 我不确定如何将第三列添加到dataframe

This will work: 这将起作用:

df.groupby('PID').agg({'AID':'count','Ethnicity':pd.Series.nunique}).add_prefix('num')

     numAID  numEthnicity
PID                
1      4          3
2      5          3

您可以添加第三列,如下所示:

newdf['numEthnicities'] = df[['PID, 'Ethnicity']].groupby('PID', as_index=False).count()

since you have found out newdf , you could try to use join function.) 由于您发现了newdf ,因此可以尝试使用join函数。)

df = df.set_index('PID')
newdf = newdf.set_index('PID')
result = df.join(newdf, lsuffix='df', rsuffix='newdf')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM