简体   繁体   中英

Pandas groupby create new column based on a condition

In the table below, I want to produce the column new area for the group created by address related fields X,Y,Z (Groupby XYZ). If in the code column, if the value is A, just count that area only once and add the remaining area for other codes.

So for this group, the new area should be 100(A)+200(B)+300(C)= 600. Note that can't take the sum since A is repeated twice. Just want one area for value A to be counted in the sum, not all of them

在此处输入图像描述

To get the above table:

df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]```

So this works, but not sure if it's the most efficient way. Since you didn't specify which code you wanted to take when there were multiple, I assumed they would hold the same value for area and so dropped duplicates.

import pandas as pd 

df = pd.DataFrame()
df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]

df2 = df.drop_duplicates(subset=['X','Y','Z','code']).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

Also, if you were able to provide the first part of the above code yourself, you'd attract more people to try and answer your question.

EDIT:

# drop duplicates but only for code = A
df_A = df[df['code']=='A'].drop_duplicates(subset=['X','Y','Z','code'])

# groupby and sum now that A only appears once - this creates the 'area sum'
df2 = pd.concat([df[df['code']!='A'],df_A]).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()

# merge onto original dataframe
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM