简体   繁体   中英

Pandas check if the sum of two columns within the same group equals

I have a purchase record table showing the purchasing history recorded by customers and the store. I would like to get a new column where it shows the sum of the customer records & store records from each customer, as a group, equals. So basically 5 + 3 = 5 + 3 so Yes for 001 but 2.= 3 so No for 002.

Date custID itmID cust rec store rec
0514 001 A 5 5
0514 001 B 3 3
0514 002 A 2 3

expected result:

Date custID itmID cust rec store rec sum match
0514 001 A 5 5 Yes
0514 001 B 3 3 Yes
0514 002 A 2 3 No

my code:

df['sum match'] = (sum(df['store rec'])==sum(df['cust rec'])).groupby(match.custID)

error shows 'bool' object has no attribute 'groupby'

You can groupby and transform sum and then check difference of the columns on axis=1. then get the last result to check if it is 0 and conditionally assign values to the new column:

c = (df.groupby("custID")[['cust rec','store rec']]
      .transform('sum').diff(axis=1).iloc[:,-1].eq(0))

df['sum match'] = np.where(c,"Yes",'No')

output:

print(df)

   Date  custID itmID  cust rec  store rec sum match
0   514       1     A         5          5       Yes
1   514       1     B         3          3       Yes
2   514       2     A         2          3        No

More details of the steps:

Groupby + transform sum returns sum of each group per column:

print(df.groupby("custID")[['cust rec','store rec']].transform('sum'))

   cust rec  store rec
0         8          8
1         8          8
2         2          3

then difference on axis=1:

print(df.groupby("custID")[['cust rec','store rec']].transform('sum').diff(axis=1))
   cust rec  store rec
0       NaN        0.0
1       NaN        0.0
2       NaN        1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM