简体   繁体   中英

How to groupby column and find the unique row then apply sum() in pandas

I have a dataframe:

 Title     Product    No     Type
   A         Mo       3       Sky
   A         Mo       3       Sky
   A         Mo       3       Sky
   A         Md       2       Sky
   A         Md       3       Sky
   A         Md       2       Sky
   A         Md       6       Land
   A         Md       6       Land
   A         Md       6       Land
   A         Mo       5       Land
   A         Mo       5       Land

I wish to have a volume column that the column is the sum of the No but groupby Type column and then only find the unique row and sum it up, below is my expected outcome:

 Title     Product    No     Type      Sum
   A         Mo       3       Sky       8
   A         Mo       3       Sky       8
   A         Mo       3       Sky       8
   A         Md       2       Sky       8
   A         Md       2       Sky       8
   A         Md       2       Sky       8
   A         Md       6       Land      11
   A         Md       6       Land      11
   A         Md       6       Land      11
   A         Mo       5       Land      11
   A         Mo       5       Land      11

I try to run the code below but i get error Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method

Code:

      FamilyVsM['Sum']=(FamilyVsM.groupby(['Type']).groupby(['Title','Product'])['No'].transform(lambda x: np.unique(x).sum()))  

IIUIC, you would need to do

In [715]: df['Sum'] = df.groupby('Type')['No'].transform(lambda x: x.unique().sum())

In [716]: df
Out[716]:
   Title Product  No  Type  Sum
0      A      Mo   3   Sky    5
1      A      Mo   3   Sky    5
2      A      Mo   3   Sky    5
3      A      Md   2   Sky    5
4      A      Md   2   Sky    5
5      A      Md   2   Sky    5
6      A      Md   6  Land   11
7      A      Md   6  Land   11
8      A      Md   6  Land   11
9      A      Mo   5  Land   11
10     A      Mo   5  Land   11

It's not one line code, but it may works

# First, group by 'Type' and sum 'No'
df_group = df.groupby(['Type'], as_index=False)['No'].sum().rename(columns={'No': 'Sum'})

# Second, merge by 'Type'
df.merge(df_group, how='left', left_on='Type', right_on='Type')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM