Pandas groupby create new column based on a condition

Question

In the table below, I want to produce the column new area for the group created by address related fields X,Y,Z (Groupby XYZ). If in the code column, if the value is A, just count that area only once and add the remaining area for other codes.

So for this group, the new area should be 100(A)+200(B)+300(C)= 600. Note that can't take the sum since A is repeated twice. Just want one area for value A to be counted in the sum, not all of them

To get the above table:

df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]```

Answer 1

So this works, but not sure if it's the most efficient way. Since you didn't specify which code you wanted to take when there were multiple, I assumed they would hold the same value for area and so dropped duplicates.

import pandas as pd 

df = pd.DataFrame()
df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]

df2 = df.drop_duplicates(subset=['X','Y','Z','code']).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

Also, if you were able to provide the first part of the above code yourself, you'd attract more people to try and answer your question.

EDIT:

# drop duplicates but only for code = A
df_A = df[df['code']=='A'].drop_duplicates(subset=['X','Y','Z','code'])

# groupby and sum now that A only appears once - this creates the 'area sum'
df2 = pd.concat([df[df['code']!='A'],df_A]).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()

# merge onto original dataframe
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

Pandas groupby create new column based on a condition

Question

1 answers

solution1
3 ACCPTED 2021-03-13 23:52:17

Pandas groupby create new column based on a condition

Question

1 answers

solution1 3 ACCPTED 2021-03-13 23:52:17

solution1
3 ACCPTED 2021-03-13 23:52:17