python pandas dataframe在分组时导致两列

Question

I have a dataframe df : 我有一个数据框df ：

PID    AID    Ethnicity
1      A      Asian
1      B      Asian
1      C      Arab
1      D      African
2      A      Asian
2      D      African
2      E      Caucasian 
2      F      African
2      B      Asian

I want to generate a frame that tells me for each PID how many AIDs it has, and how many Ethnic groups: 我想生成一个框架，告诉我每个PID多少个AIDs以及多少个种族：

So for the above the resulting newdf would be: 因此，对于上述结果， newdf将是：

PID    numAID    numEthnicities
1      4         3
2      5         3

I know how to find numAID: 我知道如何找到numAID：

newdf = df[['PID','AID']].groupby('PID',  
as_index=False).count().rename(columns={'AID':'numAID'})

I'm not sure how to add the third column to the dataframe . 我不确定如何将第三列添加到dataframe 。

Answer 1

This will work: 这将起作用：

df.groupby('PID').agg({'AID':'count','Ethnicity':pd.Series.nunique}).add_prefix('num')

     numAID  numEthnicity
PID                
1      4          3
2      5          3

Answer 2

您可以添加第三列，如下所示：

newdf['numEthnicities'] = df[['PID, 'Ethnicity']].groupby('PID', as_index=False).count()

Answer 3

since you have found out newdf , you could try to use join function.) 由于您发现了newdf ，因此可以尝试使用join函数。）

df = df.set_index('PID')
newdf = newdf.set_index('PID')
result = df.join(newdf, lsuffix='df', rsuffix='newdf')

python pandas dataframe在分组时导致两列

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-07-10 11:41:16

解决方案2
0 2017-07-10 11:38:31

解决方案3
0 2017-07-10 11:54:09

python pandas dataframe在分组时导致两列

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-07-10 11:41:16

解决方案2 0 2017-07-10 11:38:31

解决方案3 0 2017-07-10 11:54:09

解决方案1
2 已采纳 2017-07-10 11:41:16

解决方案2
0 2017-07-10 11:38:31

解决方案3
0 2017-07-10 11:54:09